-
Notifications
You must be signed in to change notification settings - Fork 0
/
Metal基础概念-利用GPU高效计算.html
401 lines (320 loc) · 32.7 KB
/
Metal基础概念-利用GPU高效计算.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
<!DOCTYPE html>
<html>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/gitalk@1/dist/gitalk.css">
<script src="https://cdn.jsdelivr.net/npm/gitalk@1/dist/gitalk.min.js"></script>
<head>
<!-- Document Settings -->
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<!-- Page Meta -->
<title>Metal基础概念——利用GPU高效计算</title>
<meta name="description" content="Freelf's Blog" />
<!-- Mobile Meta -->
<meta name="HandheldFriendly" content="True" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<!-- Brand icon -->
<link rel="shortcut icon" href="/assets/images/favicon.ico" >
<!-- Styles'n'Scripts -->
<link rel="stylesheet" type="text/css" href="/assets/css/screen.css" />
<link rel="stylesheet" type="text/css" href="//fonts.googleapis.com/css?family=Merriweather:300,700,700italic,300italic|Open+Sans:700,400" />
<link rel="stylesheet" type="text/css" href="/assets/css/syntax.css" />
<!-- highlight.js -->
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.3.0/styles/default.min.css">
<style>.hljs { background: none; }</style>
<!-- Ghost outputs important style and meta data with this tag -->
<link rel="canonical" href="//Metal%E5%9F%BA%E7%A1%80%E6%A6%82%E5%BF%B5-%E5%88%A9%E7%94%A8GPU%E9%AB%98%E6%95%88%E8%AE%A1%E7%AE%97" />
<meta name="referrer" content="origin" />
<link rel="next" href="/page2/" />
<meta property="og:site_name" content="面向自由编程" />
<meta property="og:type" content="website" />
<meta property="og:title" content="Metal基础概念——利用GPU高效计算" />
<meta property="og:description" content="Freelf's Blog" />
<meta property="og:url" content="//Metal%E5%9F%BA%E7%A1%80%E6%A6%82%E5%BF%B5-%E5%88%A9%E7%94%A8GPU%E9%AB%98%E6%95%88%E8%AE%A1%E7%AE%97" />
<meta property="og:image" content="/assets/images/cover1.jpg" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:title" content="Metal基础概念——利用GPU高效计算" />
<meta name="twitter:description" content="Freelf's Blog" />
<meta name="twitter:url" content="//Metal%E5%9F%BA%E7%A1%80%E6%A6%82%E5%BF%B5-%E5%88%A9%E7%94%A8GPU%E9%AB%98%E6%95%88%E8%AE%A1%E7%AE%97" />
<meta name="twitter:image:src" content="/assets/images/cover1.jpg" />
<script type="application/ld+json">
{
"@context": "http://schema.org",
"@type": "Website",
"publisher": "面向自由编程",
"name": "Metal基础概念——利用GPU高效计算",
"url": "//Metal%E5%9F%BA%E7%A1%80%E6%A6%82%E5%BF%B5-%E5%88%A9%E7%94%A8GPU%E9%AB%98%E6%95%88%E8%AE%A1%E7%AE%97",
"image": "/assets/images/cover1.jpg",
"description": "Freelf's Blog"
}
</script>
<meta name="generator" content="Jekyll 3.0.0" />
<link rel="alternate" type="application/rss+xml" title="面向自由编程" href="/feed.xml" />
</head>
<body class="home-template nav-closed">
<!-- The blog navigation links -->
<div class="nav">
<h3 class="nav-title">Menu</h3>
<a href="#" class="nav-close">
<span class="hidden">Close</span>
</a>
<ul>
<li class="nav-home " role="presentation"><a href="/">Home</a></li>
<li class="nav-about " role="presentation"><a href="/about">About</a></li>
</ul>
<a class="subscribe-button icon-feed" href="/feed.xml">Subscribe</a>
</div>
<span class="nav-cover"></span>
<div class="site-wrapper">
<!-- All the main content gets inserted here, index.hbs, post.hbs, etc -->
<!-- default -->
<!-- The comment above "< default" means - insert everything in this file into -->
<!-- the [body] of the default.hbs template, which contains our header/footer. -->
<!-- Everything inside the #post tags pulls data fom the post -->
<!-- #post -->
<header class="main-header post-head no-cover">
<nav class="main-nav clearfix">
<a class="back-button icon-arrow-left" href="/">Home</a>
<a class="menu-button icon-menu" href="#"><span class="word">Menu</span></a>
</nav>
</header>
<main class="content" role="main">
<article class="post">
<header class="post-header">
<h1 class="post-title">Metal基础概念——利用GPU高效计算</h1>
<section class="post-meta">
<!-- <a href='/'>Freelf</a> -->
<!-- <a href='/author/Freelf'>Freelf</a> -->
<time class="post-date"
datetime="2019-08-02">02 Aug 2019</time>
<!-- [[tags prefix=" on "]] -->
<!-- on -->
<a href='/tag/Metal'>Metal</a>
</section>
</header>
<section class="post-content">
<p>Metal是Apple提供的一个可以让我们开发者直接和GPU对话的API。Metal在图像处理,并行计算方面通过发挥GPU的特性,极大提升了性能。</p>
<p>由于本人最近在一个画板需求中使用了Metal,做项目的时候由于Deadline的原因,当时并没有有系统学习。现在项目上线,回头系统学习一下Metal,顺便入门一下图像处理。本文主要是介绍一些Metal的基础概念,通过一系列项目来了解Metal。</p>
<h2 id="在gpu中进行计算">在GPU中进行计算</h2>
<p>通过这个例子,我们会介绍一些在所有Metal应用中都会用到的一些概念:</p>
<ol>
<li>编写可以在GPU中执行的函数</li>
<li>找到一个GPU</li>
<li>通过创建管道让第一步创建的函数可以在GPU上执行</li>
<li>创建GPU可以访问的内存保存数据</li>
<li>创建command buffer并且编码GPU指令来操作数据</li>
<li>提交command buffer到command queue来让GPU执行编码的指令</li>
</ol>
<p>既然是想利用GPU进行一些操作,我们通过下图来看一下如何使用GPU进行操作:
<img src="http://nightwish.oss-cn-beijing.aliyuncs.com/2019-08-02-041818.png" alt="" />
我们只需要组装command,然后提交到GPU就可以得到我们想要的结果。</p>
<h3 id="编写在gpu中执行的函数">编写在GPU中执行的函数</h3>
<p>通过下面两个函数,我们来看一下在GPU中计算和在CPU中进行计算的区别,我们先来看一下在CPU中把两个数组中的数相加然后把结果写到第三个数组的计算过程:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">add_arrays</span><span class="p">(</span><span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">inA</span><span class="p">,</span>
<span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">inB</span><span class="p">,</span>
<span class="kt">float</span><span class="o">*</span> <span class="n">result</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">length</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">index</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">index</span> <span class="o"><</span> <span class="n">length</span> <span class="p">;</span> <span class="n">index</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">result</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="n">inA</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">+</span> <span class="n">inB</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>接下来我们看一下如何使用GPU实现相同的功能,在GPU中执行的函数我们统称为着色器函数(至于为什么叫着色器函数是有历史原因的,因为在GPU执行的函数大多是用来处理图像的像素的,所以叫做着色器函数):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">kernel</span> <span class="kt">void</span> <span class="nf">add_arrays</span><span class="p">(</span><span class="n">device</span> <span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">inA</span><span class="p">,</span>
<span class="n">device</span> <span class="k">const</span> <span class="kt">float</span><span class="o">*</span> <span class="n">inB</span><span class="p">,</span>
<span class="n">device</span> <span class="kt">float</span><span class="o">*</span> <span class="n">result</span><span class="p">,</span>
<span class="n">uint</span> <span class="n">index</span> <span class="p">[[</span><span class="n">thread_position_in_grid</span><span class="p">]])</span>
<span class="p">{</span>
<span class="c1">// the for-loop is replaced with a collection of threads, each of which
</span> <span class="c1">// calls this function.
</span> <span class="n">result</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="n">inA</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">+</span> <span class="n">inB</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="p">}</span>
</code></pre></div></div>
<p>先不要管上面函数中<code class="highlighter-rouge">kernel</code>、<code class="highlighter-rouge">[[thread_position_in_grid]]</code>这些代码是什么意思(这些是Metal Shader Language的固定语法,不属于本文的讨论),只看计算过程。通过对比,我们可以发现在GPU中,我们可以并行的对每个元素进行计算,而在CPU中,我们需要遍历数组进行计算。这就是GPU强大的运算能力。我们知道,CPU擅长进行逻辑控制,串行的运算,而GPU擅长大规模的并发计算,比如上面这种简单的运算。而图像处理本身就需要大量的简单运算,所以非常适合GPU,所以一般图像处理的任务都会交给GPU去做。</p>
<h3 id="找到一个gpu对象">找到一个GPU对象</h3>
<p>有了上面的函数,我们如何让这个函数在GPU中执行呢?
首先,我们需要拿到一个GPU。在iOS中,我们可以通过Metal的<code class="highlighter-rouge">MTLCreateSystemDefaultDevice()</code>这个函数去拿到一个GPU的抽象对象<code class="highlighter-rouge">MTLDevice</code>。GPU对象的创建非常耗时,我们平常使用时最好使用一个单例去持有它,然后进行复用。</p>
<h3 id="创建管道让函数在gpu中执行">创建管道让函数在GPU中执行</h3>
<p>通过上面简单的函数,我们获取到GPU的抽象对象。接下来需要配置一下我们前面写的运算函数,来让函数可以在GPU中进行执行。
当我们编译Metal应用时,Xcode会把着色器函数编译到默认的Metal library中。我们可以使用<code class="highlighter-rouge">MTLLibrary</code>和<code class="highlighter-rouge">MTLFunction</code>分别获取Metal library和着色器函数的信息。我们可以通过<code class="highlighter-rouge">MTLDevice</code>创建<code class="highlighter-rouge">MTLLibrary</code>,然后通过<code class="highlighter-rouge">MTLLibrary</code>来获取着色器函数:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_mDevice</span> <span class="o">=</span> <span class="n">device</span><span class="p">;</span>
<span class="n">NSError</span><span class="o">*</span> <span class="n">error</span> <span class="o">=</span> <span class="n">nil</span><span class="p">;</span>
<span class="c1">// Load the shader files with a .metal file extension in the project
</span>
<span class="n">id</span><span class="o"><</span><span class="n">MTLLibrary</span><span class="o">></span> <span class="n">defaultLibrary</span> <span class="o">=</span> <span class="p">[</span><span class="n">_mDevice</span> <span class="n">newDefaultLibrary</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">defaultLibrary</span> <span class="o">==</span> <span class="n">nil</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">NSLog</span><span class="p">(</span><span class="err">@</span><span class="s">"Failed to find the default library."</span><span class="p">);</span>
<span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">id</span><span class="o"><</span><span class="n">MTLFunction</span><span class="o">></span> <span class="n">addFunction</span> <span class="o">=</span> <span class="p">[</span><span class="n">defaultLibrary</span> <span class="n">newFunctionWithName</span><span class="o">:</span><span class="err">@</span><span class="s">"add_arrays"</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">addFunction</span> <span class="o">==</span> <span class="n">nil</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">NSLog</span><span class="p">(</span><span class="err">@</span><span class="s">"Failed to find the adder function."</span><span class="p">);</span>
<span class="k">return</span> <span class="n">nil</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>取到着色器函数,接下来我们创建管道来配置函数,因为这里我们想使用GPU进行运算,所以,我们创建一个计算类型的管道,除了计算类型的管道还有渲染类型的管道,也就是我们经常听说的渲染管道。由于这里我们只想计算一下,所以这里我们创建一个计算类型的管道<code class="highlighter-rouge">MTLComputePipelineState</code>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_mAddFunctionPSO</span> <span class="o">=</span> <span class="p">[</span><span class="n">_mDevice</span> <span class="n">newComputePipelineStateWithFunction</span><span class="o">:</span> <span class="n">addFunction</span> <span class="n">error</span><span class="o">:&</span><span class="n">error</span><span class="p">];</span>
</code></pre></div></div>
<h3 id="创建command-queue">创建command queue</h3>
<p>为了把任务提交到GPU,我们需要一个command queue,对应前面图中粉色的轨道。Metal使用命令队列来安排命令。</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_mCommandQueue</span> <span class="o">=</span> <span class="p">[</span><span class="n">_mDevice</span> <span class="n">newCommandQueue</span><span class="p">];</span>
</code></pre></div></div>
<h3 id="创建数据">创建数据</h3>
<p>因为这里我们要计算两个数组元素相加对应的和,所以这里我们需要三个数组,两个相加的数组和一个存放结果的数组。GPU可以拥有自己的内存,也可以和操作系统共享内存。Metal使用<code class="highlighter-rouge">MTLResource</code>来管理内存。一个Resource表示在执行command时GPU可以访问的内存。使用<code class="highlighter-rouge">MTLDeveice</code>为它所代表的GPU创建resource。</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_mBufferA</span> <span class="o">=</span> <span class="p">[</span><span class="n">_mDevice</span> <span class="n">newBufferWithLength</span><span class="o">:</span><span class="n">bufferSize</span> <span class="n">options</span><span class="o">:</span><span class="n">MTLResourceStorageModeShared</span><span class="p">];</span>
<span class="n">_mBufferB</span> <span class="o">=</span> <span class="p">[</span><span class="n">_mDevice</span> <span class="n">newBufferWithLength</span><span class="o">:</span><span class="n">bufferSize</span> <span class="n">options</span><span class="o">:</span><span class="n">MTLResourceStorageModeShared</span><span class="p">];</span>
<span class="n">_mBufferResult</span> <span class="o">=</span> <span class="p">[</span><span class="n">_mDevice</span> <span class="n">newBufferWithLength</span><span class="o">:</span><span class="n">bufferSize</span> <span class="n">options</span><span class="o">:</span><span class="n">MTLResourceStorageModeShared</span><span class="p">];</span>
</code></pre></div></div>
<p>在这个例子中的resource是一个<code class="highlighter-rouge">MTLBuffer</code>对象,它继承自<code class="highlighter-rouge">MTLResource</code>,<code class="highlighter-rouge">MTLBuffer</code>分配的时候没有进行预先格式化。我们可以在使用的时候指定这块内存的格式。我们在分配的时候指定了一个option,<code class="highlighter-rouge">MTLResourceStorageModeShared</code>,这个option表示GPU和CPU都可以访问这块内存。为了计算两个数组相加的结果,我们需要填充<code class="highlighter-rouge">_mBufferA</code>和<code class="highlighter-rouge">_mBufferB</code>:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">-</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="n">generateRandomFloatData</span><span class="o">:</span> <span class="p">(</span><span class="n">id</span><span class="o"><</span><span class="n">MTLBuffer</span><span class="o">></span><span class="p">)</span> <span class="n">buffer</span>
<span class="p">{</span>
<span class="kt">float</span><span class="o">*</span> <span class="n">dataPtr</span> <span class="o">=</span> <span class="n">buffer</span><span class="p">.</span><span class="n">contents</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">index</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">index</span> <span class="o"><</span> <span class="n">arrayLength</span><span class="p">;</span> <span class="n">index</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">dataPtr</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="kt">float</span><span class="p">)</span><span class="n">rand</span><span class="p">()</span><span class="o">/</span><span class="p">(</span><span class="kt">float</span><span class="p">)(</span><span class="n">RAND_MAX</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="创建command-buffer">创建command buffer</h3>
<p>接下来我们通过command queue创建command buffer:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">id</span><span class="o"><</span><span class="n">MTLCommandBuffer</span><span class="o">></span> <span class="n">commandBuffer</span> <span class="o">=</span> <span class="p">[</span><span class="n">_mCommandQueue</span> <span class="n">commandBuffer</span><span class="p">];</span>
</code></pre></div></div>
<h3 id="创建command-encoder">创建command encoder</h3>
<p>为了把command写到command buffer中,我们使用command encoder来编码具体类型的command。这个例子中,我们创建一个计算类型的command encoder,这个encoder编码计算过程。我们调用encoder的一些方法设置一些状态信息,比如设置管道和传入管道的参数。Encoder会把所有的状态改变和command参数写入到command buffer中。</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">id</span><span class="o"><</span><span class="n">MTLComputeCommandEncoder</span><span class="o">></span> <span class="n">computeEncoder</span> <span class="o">=</span> <span class="p">[</span><span class="n">commandBuffer</span> <span class="n">computeCommandEncoder</span><span class="p">];</span>
<span class="p">[</span><span class="n">computeEncoder</span> <span class="n">setComputePipelineState</span><span class="o">:</span><span class="n">_mAddFunctionPSO</span><span class="p">];</span>
<span class="p">[</span><span class="n">computeEncoder</span> <span class="n">setBuffer</span><span class="o">:</span><span class="n">_mBufferA</span> <span class="n">offset</span><span class="o">:</span><span class="mi">0</span> <span class="n">atIndex</span><span class="o">:</span><span class="mi">0</span><span class="p">];</span>
<span class="p">[</span><span class="n">computeEncoder</span> <span class="n">setBuffer</span><span class="o">:</span><span class="n">_mBufferB</span> <span class="n">offset</span><span class="o">:</span><span class="mi">0</span> <span class="n">atIndex</span><span class="o">:</span><span class="mi">1</span><span class="p">];</span>
<span class="p">[</span><span class="n">computeEncoder</span> <span class="n">setBuffer</span><span class="o">:</span><span class="n">_mBufferResult</span> <span class="n">offset</span><span class="o">:</span><span class="mi">0</span> <span class="n">atIndex</span><span class="o">:</span><span class="mi">2</span><span class="p">];</span>
</code></pre></div></div>
<p><img src="http://nightwish.oss-cn-beijing.aliyuncs.com/2019-08-02-41819.png" alt="" />
每一个计算command都会让GPU创建一组线程去执行计算,这里我们需要设置GPU分配的线程数,关于如何计算这个数字,不属于本文讨论范围,想要了解详细的计算可以看<a href="https://developer.apple.com/documentation/metal/calculating_threadgroup_and_grid_sizes?language=objc"> Calculating Threadgroup and Grid Sizes </a>这篇文章。在这里我们先简单设置一下:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">MTLSize</span> <span class="n">gridSize</span> <span class="o">=</span> <span class="n">MTLSizeMake</span><span class="p">(</span><span class="n">arrayLength</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">NSUInteger</span> <span class="n">threadGroupSize</span> <span class="o">=</span> <span class="n">_mAddFunctionPSO</span><span class="p">.</span><span class="n">maxTotalThreadsPerThreadgroup</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">threadGroupSize</span> <span class="o">></span> <span class="n">arrayLength</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">threadGroupSize</span> <span class="o">=</span> <span class="n">arrayLength</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">MTLSize</span> <span class="n">threadgroupSize</span> <span class="o">=</span> <span class="n">MTLSizeMake</span><span class="p">(</span><span class="n">threadGroupSize</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">[</span><span class="n">computeEncoder</span> <span class="n">dispatchThreads</span><span class="o">:</span><span class="n">gridSize</span>
<span class="n">threadsPerThreadgroup</span><span class="o">:</span><span class="n">threadgroupSize</span><span class="p">];</span>
</code></pre></div></div>
<p>描述完整个计算过程后,我们可以调用<code class="highlighter-rouge">[computeEncoder endEncoding];</code>这个方法关闭计算过程。</p>
<h3 id="提交command-buffer执行命令">提交command buffer执行命令</h3>
<p>我们可以通过调用<code class="highlighter-rouge">[commandBuffer commit];</code>这个方法,提交command buffer中的命令。我们提交后,Metal会异步的准备将要执行的命令,然后安排GPU执行命令。当GPU执行完command buffer中的所有命令后,Metal会标记这个command buffer执行完成。</p>
<p>我们的应用可以在GPU执行命令时做一些其他的工作,在这个例子中,我们不需要做其他工作,所以在这里我们等待GPU执行完整个command buffer:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="n">commandBuffer</span> <span class="n">waitUntilCompleted</span><span class="p">];</span>
</code></pre></div></div>
<p>Metal也支持通知的方式来通知应用,我们可以通过command buffer 的<code class="highlighter-rouge">addCompletedHandler:</code>方法添加一个handler,或者监听command buffer的<code class="highlighter-rouge">status</code>属性来在GPU执行命令时做其他工作</p>
<h3 id="读取结果并验证正确定">读取结果并验证正确定</h3>
<p>当command buffer被执行后,GPU的计算结果存放在输出的buffer中。在一个真实的应用中,我们会从buffer中读取结果,并使用这个结果做一些事情,比如:在屏幕上显示结果或着把结果写入到文件中。
在我们这个例子中,我们仅仅读取一下结果验证一下在GPU中计算是否和在CPU中计算的是否一致。</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">-</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="n">verifyResults</span>
<span class="p">{</span>
<span class="kt">float</span><span class="o">*</span> <span class="n">a</span> <span class="o">=</span> <span class="n">_mBufferA</span><span class="p">.</span><span class="n">contents</span><span class="p">;</span>
<span class="kt">float</span><span class="o">*</span> <span class="n">b</span> <span class="o">=</span> <span class="n">_mBufferB</span><span class="p">.</span><span class="n">contents</span><span class="p">;</span>
<span class="kt">float</span><span class="o">*</span> <span class="n">result</span> <span class="o">=</span> <span class="n">_mBufferResult</span><span class="p">.</span><span class="n">contents</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">index</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">index</span> <span class="o"><</span> <span class="n">arrayLength</span><span class="p">;</span> <span class="n">index</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">assert</span><span class="p">(</span><span class="n">result</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">==</span> <span class="n">a</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">+</span> <span class="n">b</span><span class="p">[</span><span class="n">index</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="总结">总结</h2>
<p>通过这个例子,我们可以看到,通过编码命令的方式,可以让GPU执行一些任务。在这个例子中,我们仅仅让GPU执行了一些运算任务。让GPU执行一些其他的任务和计算任务大同小异,都是这些过程:</p>
<ol>
<li>先要找到GPU对象</li>
<li>编写任务函数</li>
<li>通过编码任务把数据丢给GPU</li>
<li>提交任务。</li>
</ol>
<p>我们可以利用GPU执行很多任务,但是一定要发挥GPU强大的运算能力。比如图像处理就是让GPU对每个像素执行一些算法,然后变换出一些滤镜效果。一定不要把CPU擅长的任务交给GPU去做,这就南辕北辙了。接下来我们将介绍一下GPU最重要的能力——渲染。</p>
<p>本文实例代码:<a href="https://docs-assets.developer.apple.com/published/5483f37f98/PerformingCalculationsOnAGPU.zip">代码</a></p>
</section>
<footer class="post-footer">
<!-- Everything inside the #author tags pulls data from the author -->
<!-- #author-->
<figure class="author-image">
<a class="img" href="https://github.com/zhangdongpo"
style="background-image: url(/assets/images/freelf.jpg)"><span
class="hidden">Freelf's Picture</span></a>
</figure>
<section class="author">
<h4><a href="/author/Freelf">Freelf</a></h4>
<p> iOS Developer</p>
<div class="author-meta">
<span class="author-location icon-location">
Beijing, China</span>
<span class="author-link icon-link"><a href="https://freelf.me"> freelf.me</a></span>
</div>
</section>
<!-- /author -->
<section class="share">
<h4>Share this post</h4>
<a class="icon-twitter"
href="http://twitter.com/share?text=Metal基础概念——利用GPU高效计算&url=Metal%E5%9F%BA%E7%A1%80%E6%A6%82%E5%BF%B5-%E5%88%A9%E7%94%A8GPU%E9%AB%98%E6%95%88%E8%AE%A1%E7%AE%97"
onclick="window.open(this.href, 'twitter-share', 'width=550,height=235');return false;">
<span class="hidden">Twitter</span>
</a>
<a class="icon-facebook"
href="https://www.facebook.com/sharer/sharer.php?u=Metal%E5%9F%BA%E7%A1%80%E6%A6%82%E5%BF%B5-%E5%88%A9%E7%94%A8GPU%E9%AB%98%E6%95%88%E8%AE%A1%E7%AE%97"
onclick="window.open(this.href, 'facebook-share','width=580,height=296');return false;">
<span class="hidden">Facebook</span>
</a>
<a class="icon-google-plus"
href="https://plus.google.com/share?url=Metal%E5%9F%BA%E7%A1%80%E6%A6%82%E5%BF%B5-%E5%88%A9%E7%94%A8GPU%E9%AB%98%E6%95%88%E8%AE%A1%E7%AE%97"
onclick="window.open(this.href, 'google-plus-share', 'width=490,height=530');return false;">
<span class="hidden">Google+</span>
</a>
</section>
<!-- Add Disqus Comments -->
<div id="gitalk-container"></div>
<script src="/assets/js/md5.min.js"></script>
<script>
const gitalk = new Gitalk({
clientID: 'b62d799fc3a6b1dd7de9',
clientSecret: 'd5a315aa0855dd6a00cdb4025b88059c1b0c585f',
repo: 'blog',
owner: 'zhangdongpo',
admin: 'zhangdongpo',
id: md5(location.pathname), // Ensure uniqueness and length less than 50
distractionFreeMode: false // Facebook-like distraction free mode
})
gitalk.render('gitalk-container')
</script>
</footer>
</article>
</main>
<!-- /post -->
<!-- The tiny footer at the very bottom -->
<footer class="site-footer clearfix">
<section class="copyright"><a href="/">面向自由编程</a> © 2024</section>
<section class="poweredby">Proudly published with <a href="https://jekyllrb.com/">Jekyll</a> using <a href="https://github.com/jekyller/jasper">Jasper</a></section>
</footer>
</div>
<!-- highlight.js -->
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.3.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
<!-- jQuery needs to come before `` so that jQuery can be used in code injection -->
<script type="text/javascript" src="//code.jquery.com/jquery-1.12.0.min.js"></script>
<!-- Ghost outputs important scripts and data with this tag -->
<!-- -->
<!-- Add Google Analytics -->
<!-- Google Analytics Tracking code -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', '', 'auto');
ga('send', 'pageview');
</script>
<!-- Fitvids makes video embeds responsive and awesome -->
<script type="text/javascript" src="/assets/js/jquery.fitvids.js"></script>
<!-- The main JavaScript file for Casper -->
<script type="text/javascript" src="/assets/js/index.js"></script>
</body>
</html>