'CUDA' 태그의 글 목록

4.4 Thread Configuration First, we choose a warp-size multiple for the block horizontal size: 32 · 2 = 64. Then we have to choose a block vertical size. Because the typical maximum threads per block is 1024, we cannot choose 64 for the vertical dimensions, as we would have 64 · 64 = 4096 threads! For this reason we choose a 64x4 block size (256 threads per block). We divide the domain size in 64..

4 CUDA Fluids Implementation The main changes needed to port a CPU implementation to a CUDA implementation are described in this section. First, we present alternatives to CPU interpolation and the CPU fftw library. Second, we explain how to access OpenGL vertex buffer from a kernel. After that, we need to allocate memory in the GPU device that the kernels will use for the simulation step comput..

2. Introduction Here is explained how a CPU fluid simulation can have its performance increased by using CUDA, a GPU solution. Before presenting any implementation details we will review some of the CUDA keypoints. After that, the author explains how to deploy a thread layout for fluid parallel computing. Furthermore, the reader can find GPU optimized alternatives to CPU libraries. Finally, we w..

본인은 지금 막혀있다. 기존의 프로젝트를 GPU 프로그래밍으로 바꾸려고하는데 Nvidia의 예제로 배우는 CUDA 어쩌구로는 지금의 막힌 상황을 타개할 수 가 없다. 왜냐면 본인의 귀여운 뇌로 이해하기엔 설명이 부족하기 때문이다. 그러던 중 마드리드 공과대학교에서 Nnvidia의 시뮬레이션 예제중 유체 시뮬레이션을 OpenGL을 활용하여 구현한 예제를 설명한 PDF를 웹 서핑중 발견했다. 이 가뭄의 단비와도 같은 페이퍼는 서문에 이렇게 적어놓았다. "NIVIDIA의 Fluid simulation은 코드를 구체적으로 설명하지 않기 때문에 해당 부분을 더욱 확장하여 공부하고 싶은 모든 사람에게 이 문서는 특히 유용할 것이다." 이 얼마나 아름다운 서문인가 정말 본인을 위해 작성했다고 봐도 무방할만큼 본인의 ..

지금까지 CUDA 에서 데이터 처리하는 방법에 대해 알아보았다. 이번 시간에는 간단한 CUDA 예제를 통해 자세하게 알아보도록 하자. 앞서 GPU에서 데이터 병렬처리를 다음과 같은 순서로 진행한다고 하였다. CUDA에서 데이터 병렬 처리 1. 입력과 출력에 사용할 데이터를 PC 메모리에 할당 2. 입력과 출력에 사용할 데이터를 그래픽 메모리에 할당 3. 처리하고자 하는 값을 PC 메모리에 입력 4. PC 메모리에 있는 입력 데이터를 그래픽 메모리로 복사 5. 데이터를 분할하여 GPU로 가져옴 6. 수천개 이상의 스레드를 생성하여 커널 함수로 병렬처리 7. 처리된 결과를 병합 8. PC 메모리에 결과를 전송 9. 그래픽 메모리를 해제 10. PC 메모리를 해제 1. 입력과 출력에 사용할 데이터를 PC 메모..

티스토리툴바