This one’s going to be kind of all over the place, and it’s mostly going to be based on the things that I’ve been seeing doing an initial pass at experimenting in UE5. It’s out an available to start messing around with, whether precompiled on the EGS launcher or through Github source. I went with the latter.
An Unreal Engine major upgrade is always a big deal for me. I’ve been working in Unreal for a long time, so making that step forward is always a mix of familiarity with the old style and figuring out what’s new. My timeline with Unreal looks like this:
Unreal Engine 2.x
Unreal Demolition – 2004-2006, UT2004 mod in Unreal Engine 2.5
Unreal Engine 3
Unreal Demolition – 2007-2010, UT3 mod
ARC Squadron / ARC Squadrox: Redux – 2011-2013 UE3 game on iOS, Android, Fire OS
Rocket League – 2013 – Initial work getting the game ported to non-Windows platforms
Cancelled mobile project – 2013-2014, engine and tool improvements to modernize UE3 on mobile for cancelled internal project
Smite – 2014-2016 – Gameplay engineer manager and generalist for console ports
Killing Floor 2 – 2017 – Gameplay engineer
Unreal Engine 4
Hand of the Gods – 2016-2017 – Protoyped in UE3, ported to UE4 for public release
Studio R&D – 2017-2018 – Worked on internal prototypes and R&D for UE4 between KF2 and Maneater projects
Maneater – 2018-2019 – Gameplay lead for multiplatform project
Squanch projects – 2019 – present. Studio has continued using UE4 for internal projects since I arrived. Beyond its use in Trover Saves the Universe, the development team in general has a lot of previous history with Unreal.
Because Unreal has been so key to my career, I jumped in to UE5 the second I had a chance and have been messing around since. This ramblings is basically the things that have stood out to me so far in my digging around for the past few days.
In talking with Quintosh on Twitter, he asked why I didn’t go with some sort of thread pool. Frankly, it’s because I’m lazy. However, it did give me an idea. The data that I’m working on is fairly parallelizable without memory collisions. Rather than creating a thread pool where I hand work to threads, it would be a lot lower amount of work for a simple program if I let the threads directly request work themselves. This gave me a few things:
A unit of work could be much smaller, so cores that are running slower because of other system processes don’t present blocks.
Once I implement other things that scale the program non-linearly (ex: reflection/refraction), I don’t have to worry about intelligently breaking up the work into equal sizes.
The only member that is accessed across many threads is the piece of data controlling the work request. This keeps my actual locks to a minimum.
Much to my surprise, I also managed to get Intel VTune working completely. This helped confirm some of my assumptions from yesterday’s article, so I’ll cover that in some detail later on.
Being a primarily Unreal-focused developer, I don’t really spend that much time in standard C++. Ya, technically I work a lot in C++, but C++ using STL and related things is very different from the custom containers and macro-heavy nature of working in Unreal. Part of the fallout of that is that I generally miss new features of the language for quite a while. I didn’t get into working in C++11 and newer until I was off of working in UE3, and even moving into UE4 I don’t get exposed to things like STL or the the standard implementation of threads. It’s one of those things where, ya I’ve used threads and I get the concepts behind it, but creating a worker thread to offload a specific task is much different than architecting code to properly and efficiently support a threadable workload. That’s where my screwing around here comes into play.
For this screwing around, I decided to thread an implementation of a raytracer. It’s a workload that is inherently parallelizable. You’ve got a bunch of rays going out that can independently resolve themselves. Ya, you may need to have a ray spawn further rays, but that can live within the worker thread as it chews through the work. From a naive implementation standpoint, each pixel could be its own thread and run in parallel, and that’s basically where I started.
For the purposes of this, I started with a sample implementation from Ray Tracing in One Weekend by Peter Shirley. This series of books is a supremely fantastic quick look at basic concepts behind ray tracing, and gave me a quick place to get to a point where I could investigate threading.
For my CPU, I’m running this on an AMD 3950x (16 core, 32 thread) at stock speeds. I’m not doing anything to minimize background processes, but it shouldn’t be a huge issue for where I’m at.
I’m currently using Visual Studio 2019’s built-in performance profiler. I don’t particularly like it compared to other tools, but my profiler of choice on my current hardware (AMD uProf) currently has a bug on some installs of the May 2020 version of Windows 10 that prevents profile captures. The VS profiler is basic, but gets me enough information for the basics that I’m starting at.
This is running in release with default optimizations purely out of laziness.
I’ll post some code samples around. These won’t generally compile because I’m stripping out unnecessary stuff from the samples (ex: you don’t need to care about me setting image dimensions and writing it to disk).
For the purposes of my current testing, this is the output image. It’s two spheres where each pixel color represents the surface normal hit by a ray. The image is 800×400 resolution and each pixel does 100 slightly randomized rays to give an anti-aliased result. In the basic current pass, I’m not doing any bounced rays on collisions. The final image is therefore the result of 32 million ray casts. In some future tests, I’ll be adapting the rest of the book to the multithreaded version and support reflection/refraction and increasing the workload through that process.