AnyaCoder.github.io/search.json at main · AnyaCoder/AnyaCoder.github.io · GitHub

1
[{"title":"Mipmap 实现详解：从原理到代码（含数学推导）","path":"/2025/04/25/Mipmap/","content":"在实时渲染中，纹理贴图是赋予模型表面细节的关键技术。然而，当一个带有高分辨率纹理的模型距离摄像机很远，或者以一个倾斜的角度观察时，屏幕上的一个像素可能会对应纹理上的多个纹素（Texel）。如果不进行处理，直接采样会导致严重的摩尔纹（Moiré patterns）和闪烁（Shimmering）现象，即纹理混叠（Texture Aliasing）。 Mipmapping 是解决这一问题的经典技术。其核心思想是预先生成一系列分辨率递减的纹理版本（Mip 层级），并在渲染时根据屏幕像素所需的细节程度（Level of Detail, LOD）选择合适的 Mip 层级进行采样，从而有效减少混叠并提高渲染性能。 本文将详细阐述 Mipmap 的实现过程，包括 Mipmap 的生成、关键的 LOD 计算（包含数学推导）以及最终的三线性过滤采样。 1. Mipmap 的生成Mipmap 的基础是创建一系列低分辨率的纹理图像： Level 0: 原始的、最高分辨率的纹理。 Level 1: 分辨率是 Level 0 的一半（宽和高各一半）。 Level 2: 分辨率是 Level 1 的一半。 … 以此类推，直到某个维度的分辨率达到 1。 生成方法最常用的方法是下采样（Downsampling）。一个简单的实现是使用 2x2 盒子滤波器（Box Filter）：将上一层级中每 2x2 个像素的颜色进行平均，得到下一层级的一个像素颜色。 以下是 TGATexture 实现中的 generateNextMipLevel 函数示例： 1234567891011121314151617181920212223242526272829303132namespace &#123; // Anonymous namespace// Simple Box Filter downsampling for Mipmap Generationbool generateNextMipLevel(const Texture::MipLevel&amp; inputLevel, Texture::MipLevel&amp; outputLevel) &#123; if (inputLevel.width &lt;= 1 &amp;&amp; inputLevel.height &lt;= 1) &#123; return false; // Cannot downsample further &#125; outputLevel.width = std::max(1, inputLevel.width / 2); outputLevel.height = std::max(1, inputLevel.height / 2); outputLevel.pixels.resize(outputLevel.width * outputLevel.height); for (int y = 0; y &lt; outputLevel.height; ++y) &#123; for (int x = 0; x &lt; outputLevel.width; ++x) &#123; // Calculate corresponding 2x2 area top-left corner in input level int inputX = x * 2; int inputY = y * 2; // Average 2x2 block (handle boundaries by clamping coordinates) const vec3f&amp; p00 = inputLevel.pixels[std::min(inputLevel.height - 1, inputY + 0) * inputLevel.width + std::min(inputLevel.width - 1, inputX + 0)]; const vec3f&amp; p10 = inputLevel.pixels[std::min(inputLevel.height - 1, inputY + 0) * inputLevel.width + std::min(inputLevel.width - 1, inputX + 1)]; const vec3f&amp; p01 = inputLevel.pixels[std::min(inputLevel.height - 1, inputY + 1) * inputLevel.width + std::min(inputLevel.width - 1, inputX + 0)]; const vec3f&amp; p11 = inputLevel.pixels[std::min(inputLevel.height - 1, inputY + 1) * inputLevel.width + std::min(inputLevel.width - 1, inputX + 1)]; vec3f sumColor = (p00 + p10 + p01 + p11) * 0.25f; // Average the 4 pixels outputLevel.pixels[y * outputLevel.width + x] = sumColor; &#125; &#125; return true;&#125;&#125; // end anonymous namespace 在 TGATexture::load 中，生成 Mipmap 的流程如下： 123456789// Inside TGATexture::load after loading base levelint currentLevelIndex = 0;while (mipLevels[currentLevelIndex].width &gt; 1 || mipLevels[currentLevelIndex].height &gt; 1) &#123; Texture::MipLevel nextLevel; if (!generateNextMipLevel(mipLevels[currentLevelIndex], nextLevel)) break; mipLevels.push_back(std::move(nextLevel)); currentLevelIndex++; // Safety break ...&#125; 加载预生成 Mipmap某些纹理格式（如 DDS）允许直接存储预先生成好的 Mipmap 层级。加载时只需按顺序读取并解压（如果需要）每个层级的数据： 12345678910111213141516171819202122232425// Inside DDSTexture::loaduint32_t numLevels = (header.flags &amp; DDSD_MIPMAPCOUNT) ? header.mipMapCount : 1;mipLevels.resize(numLevels);int currentWidth = baseWidth;int currentHeight = baseHeight;for (uint32_t level = 0; level &lt; numLevels; ++level) &#123; // ... calculate dataSize for this level ... std::vector&lt;unsigned char&gt; compressedData(dataSize); file.read(reinterpret_cast&lt;char*&gt;(compressedData.data()), dataSize); // ... check read errors ... mipLevels[level].width = currentWidth; mipLevels[level].height = currentHeight; // Decompress this level into mipLevels[level].pixels bool success = false; if (isDXT1) success = decompressDXT1LevelInternal(compressedData, currentWidth, currentHeight, mipLevels[level].pixels); // ... handle DXT5, ATI2 etc. ... if (!success) &#123; /* handle error */ &#125; // Calculate dimensions for the next level currentWidth = std::max(1, currentWidth / 2); currentHeight = std::max(1, currentHeight / 2);&#125; 2. 细节级别 (Level of Detail - LOD) 计算LOD 计算是 Mipmapping 的核心。我们需要为屏幕上的每个像素计算一个 LOD 值，表示该像素需要多大程度的纹理细节。LOD 值越高，表示需要的细节越少，应使用分辨率更低的 Mip 层级。 LOD 的计算基于纹理坐标在屏幕空间的变化率。如果纹理坐标 $(u, v)$ 相对于屏幕坐标 $(x, y)$ 变化很快（例如，纹理被强烈压缩），则需要更模糊的 Mip 层级（高 LOD）；反之，如果变化很慢（纹理被放大），则需要更清晰的 Mip 层级（低 LOD）。 数学推导目标是计算偏导数：$\\frac{\\partial u}{\\partial x}$, $\\frac{\\partial u}{\\partial y}$, $\\frac{\\partial v}{\\partial x}$, $\\frac{\\partial v}{\\partial y}$。由于透视投影的存在，$u$ 和 $v$ 并非屏幕坐标 $x$ 和 $y$ 的线性函数，直接计算这些偏导数较为复杂。 经过透视除法后的**透视矫正（Perspective-Correct）**属性是屏幕坐标的线性函数。这些属性包括 $u’ &#x3D; \\frac{u}{w}$, $v’ &#x3D; \\frac{v}{w}$ 以及 $q &#x3D; \\frac{1}{w}$，其中 $w$ 是顶点变换到裁剪空间后的齐次坐标 $W$ 分量。 我们可以先计算这些矫正后属性对屏幕坐标的梯度：$\\frac{\\partial u’}{\\partial x}$, $\\frac{\\partial u’}{\\partial y}$, $\\frac{\\partial v’}{\\partial x}$, $\\frac{\\partial v’}{\\partial y}$, $\\frac{\\partial q}{\\partial x}$, $\\frac{\\partial q}{\\partial y}$。这些梯度在三角形内部是恒定的，可在三角形设置阶段（光栅化之前）计算一次。 假设三角形在屏幕空间的顶点坐标为 $(x_0, y_0), (x_1, y_1), (x_2, y_2)$，对应的某个透视矫正属性值为 $a_0, a_1, a_2$。我们可以建立线性方程组： $$\\begin{aligned}a_0 &amp;&#x3D; A x_0 + B y_0 + C \\a_1 &amp;&#x3D; A x_1 + B y_1 + C \\a_2 &amp;&#x3D; A x_2 + B y_2 + C\\end{aligned}$$ 解这个方程组得到 $A &#x3D; \\frac{\\partial a}{\\partial x}$ 和 $B &#x3D; \\frac{\\partial a}{\\partial y}$。使用克莱姆法则或直接代入消元： $$\\frac{\\partial a}{\\partial x} &#x3D; \\frac{(a_1 - a_0)(y_2 - y_0) - (a_2 - a_0)(y_1 - y_0)}{(x_1 - x_0)(y_2 - y_0) - (x_2 - x_0)(y_1 - y_0)}$$ $$\\frac{\\partial a}{\\partial y} &#x3D; \\frac{(a_2 - a_0)(x_1 - x_0) - (a_1 - a_0)(x_2 - x_0)}{(x_1 - x_0)(y_2 - y_0) - (x_2 - x_0)(y_1 - y_0)}$$ 分母是三角形屏幕空间面积的两倍（带符号）。令 $\\Delta &#x3D; (x_1 - x_0)(y_2 - y_0) - (x_2 - x_0)(y_1 - y_0)$。将 $a$ 替换为 $u’, v’, q$，即可计算 $\\frac{\\partial u’}{\\partial x}$, $\\frac{\\partial v’}{\\partial x}$, $\\frac{\\partial q}{\\partial x}$ 等。 以下是相关代码实现： 123456789101112131415161718// Inside calculateAccurateGradients(const ScreenVertex v[3])AccurateScreenSpaceGradients grads;float x0 = static_cast&lt;float&gt;(v[0].x), y0 = static_cast&lt;float&gt;(v[0].y);// ... x1, y1, x2, y2 ...vec2f uv0_over_w = v[0].varyings.uv * v[0].invW;// ... uv1_over_w, uv2_over_w ...float invW0 = v[0].invW; // This is q0// ... invW1, invW2 ...float delta = (x1 - x0) * (y2 - y0) - (x2 - x0) * (y1 - y0);if (std::abs(delta) &lt; 1e-9f) return grads; // Handle degeneratefloat invDelta = 1.0f / delta;// Calculate gradients of perspective-correct attributesgrads.dUVoverW_dX = ((uv1_over_w - uv0_over_w) * (y2 - y0) - (uv2_over_w - uv0_over_w) * (y1 - y0)) * invDelta;grads.dUVoverW_dY = ((uv2_over_w - uv0_over_w) * (x1 - x0) - (uv1_over_w - uv0_over_w) * (x2 - x0)) * invDelta;grads.dInvW_dX = ((invW1 - invW0) * (y2 - y0) - (invW2 - invW0) * (y1 - y0)) * invDelta; // dq/dxgrads.dInvW_dY = ((invW2 - invW0) * (x1 - x0) - (invW1 - invW0) * (x2 - x0)) * invDelta; // dq/dy 使用链式法则计算原始纹理坐标 $(u, v)$ 的导数。因为 $u &#x3D; \\frac{u’}{q}$ 且 $v &#x3D; \\frac{v’}{q}$： $$\\frac{\\partial u}{\\partial x} &#x3D; \\frac{\\partial}{\\partial x} \\left( \\frac{u’}{q} \\right) &#x3D; \\frac{\\frac{\\partial u’}{\\partial x} q - u’ \\frac{\\partial q}{\\partial x}}{q^2} &#x3D; \\frac{1}{q} \\frac{\\partial u’}{\\partial x} - \\frac{u’}{q^2} \\frac{\\partial q}{\\partial x} &#x3D; w \\frac{\\partial u’}{\\partial x} - u w \\frac{\\partial q}{\\partial x}$$ $$\\frac{\\partial v}{\\partial x} &#x3D; w \\frac{\\partial v’}{\\partial x} - v w \\frac{\\partial q}{\\partial x}$$ $$\\frac{\\partial u}{\\partial y} &#x3D; w \\frac{\\partial u’}{\\partial y} - u w \\frac{\\partial q}{\\partial y}$$ $$\\frac{\\partial v}{\\partial y} &#x3D; w \\frac{\\partial v’}{\\partial y} - v w \\frac{\\partial q}{\\partial y}$$ 这些计算需在每个像素执行，因为 $u, v, w$（以及 $q &#x3D; \\frac{1}{w}$）在像素间通过插值得到： 123456789// Inside drawScanlines pixel loop (x loop)float currentInvW = invWa + (invWb - invWa) * tHoriz; // Interpolated q = 1/wif (std::abs(currentInvW) &lt; 1e-9f) continue; // Avoid division by zerofloat currentW = 1.0f / currentInvW;Varyings finalVaryings = interpolateVaryings(tHoriz, varyingsA, varyingsB, invWa, invWb); // Interpolated u, v etc.// --- Accurate Derivative Calculation using Chain Rule ---vec2f uv_ddx = currentW * gradients.dUVoverW_dX - finalVaryings.uv * currentW * gradients.dInvW_dX;vec2f uv_ddy = currentW * gradients.dUVoverW_dY - finalVaryings.uv * currentW * gradients.dInvW_dY; 接下来，计算标量值 $\\rho$，表示纹理在屏幕上被拉伸或压缩的程度。通常取 $x$ 和 $y$ 方向上变化率向量长度的最大值： $$\\rho &#x3D; \\max \\left( \\sqrt{ \\left( \\frac{\\partial u}{\\partial x} \\right)^2 + \\left( \\frac{\\partial v}{\\partial x} \\right)^2 }, \\sqrt{ \\left( \\frac{\\partial u}{\\partial y} \\right)^2 + \\left( \\frac{\\partial v}{\\partial y} \\right)^2 } \\right)$$ $\\rho$ 表示屏幕上移动一个像素的距离，大约对应于纹理空间中移动 $\\rho$ 个纹素的距离。最终，计算 LOD 值 $\\lambda$（OpenGL 术语）： $$\\lambda &#x3D; \\log_2(\\rho)$$ 若 $\\lambda &#x3D; 0$，表示屏幕一个像素对应纹理一个纹素，使用 Level 0；若 $\\lambda &#x3D; 1$，表示屏幕一个像素对应纹理 2x2 个纹素，使用 Level 1；若 $\\lambda &#x3D; k$，表示屏幕一个像素对应纹理 $2^k \\times 2^k$ 个纹素，使用 Level $k$。 在代码中，通常考虑纹理尺寸 $W_{tex}, H_{tex}$，并直接计算 $\\rho^2$ 以避免开方： $$\\rho^2 \\approx \\max \\left( \\frac{\\left| \\frac{d(u,v)}{dx} \\right|^2}{W_{tex}^2}, \\frac{\\left| \\frac{d(u,v)}{dy} \\right|^2}{H_{tex}^2} \\right)$$ 这里使用向量 $\\frac{d(u,v)}{dx} &#x3D; \\left( \\frac{\\partial u}{\\partial x}, \\frac{\\partial v}{\\partial x} \\right)$ 的长度平方，并假设纹理是各向同性的。LOD 计算为： $$\\lambda &#x3D; \\frac{1}{2} \\log_2(\\rho^2)$$ 代码实现如下： 123456789101112131415// Inside Texture::sample(..., const vec2f&amp; ddx, const vec2f&amp; ddy)const auto&amp; baseLevel = mipLevels[0];float baseWidth = static_cast&lt;float&gt;(baseLevel.width);float baseHeight = static_cast&lt;float&gt;(baseLevel.height);// Calculate rho squaredfloat rho_sq = std::max(ddx.lengthSq() * baseWidth * baseWidth, ddy.lengthSq() * baseHeight * baseHeight);// Calculate LOD levelfloat lod = 0.0f;if (rho_sq &gt; 1e-9f) &#123; // Avoid log(0) lod = 0.5f * std::log2(rho_sq);&#125;lod = std::max(0.0f, lod); // Clamp LOD &gt;= 0 3. Mipmap 采样 (三线性过滤)计算出 LOD 值 $\\lambda$ 后，使用三线性过滤（Trilinear Filtering）从 Mipmap 层级中采样颜色： 选择层级：根据 $\\lambda$ 确定两个最接近的 Mip 层级： $D_0 &#x3D; \\lfloor \\lambda \\rfloor$（向下取整） $D_1 &#x3D; D_0 + 1$确保 $D_0$ 和 $D_1$ 不超过最大 Mip 层级索引。 层内双线性采样：对 $D_0$ 和 $D_1$，使用纹理坐标 $(u, v)$ 进行双线性过滤（Bilinear Filtering），得到颜色 $C_0$ 和 $C_1$。 1234567// Texture::sampleBilinear(const MipLevel&amp; level, float u, float v) helper function// ... calculates x0, y0, u_frac, v_frac ...// ... samples 4 neighbors c00, c10, c01, c11 with clamping ...// Bilinear interpolation:vec3f top = c00 * (1.0f - u_frac) + c10 * u_frac;vec3f bottom = c01 * (1.0f - u_frac) + c11 * u_frac;return top * (1.0f - v_frac) + bottom * v_frac; 层间线性插值：计算 $\\lambda$ 的小数部分 $t &#x3D; \\lambda - \\lfloor \\lambda \\rfloor$，在 $C_0$ 和 $C_1$ 之间进行线性插值，得到最终颜色 $C$： $$C &#x3D; C_0 \\times (1 - t) + C_1 \\times t$$ 代码实现如下： 123456789101112131415161718192021222324// Inside Texture::sample(...) after calculating lodint maxLevel = static_cast&lt;int&gt;(mipLevels.size()) - 1;int level0_idx = static_cast&lt;int&gt;(std::floor(lod));level0_idx = std::min(level0_idx, maxLevel); // Clamp// Sample from the first level using bilinear filteringvec3f color0 = sampleBilinear(mipLevels[level0_idx], u, v);// If we are at the highest LOD or only have one level, return bilinearly filtered resultif (level0_idx == maxLevel) &#123; return color0;&#125;// Get the second level index for trilinear interpolationint level1_idx = level0_idx + 1; // Already clamped indirectly// Sample from the second level using bilinear filteringvec3f color1 = sampleBilinear(mipLevels[level1_idx], u, v);// Calculate interpolation factor between the two levelsfloat level_t = lod - static_cast&lt;float&gt;(level0_idx); // Fractional part of LOD// Trilinear interpolationreturn color0 * (1.0f - level_t) + color1 * level_t; 4. 整合到渲染管线Mipmap 流程在渲染器中的整合如下： 顶点着色器： 处理顶点，输出裁剪空间坐标和需要插值的 Varyings（包括纹理坐标 $uv$）。 三角形处理 (processFace)： 对顶点进行透视除法和视口变换，得到屏幕坐标 $(x, y)$ 和 $invW$。 进行背面剔除。 调用 calculateAccurateGradients 计算三角形的 $\\frac{\\partial (u&#x2F;w)}{\\partial x}$, $\\frac{\\partial (1&#x2F;w)}{\\partial x}$ 等梯度。 调用 drawTriangle。 三角形光栅化 (drawTriangle, drawScanlines)： 遍历三角形覆盖的像素。 对每个像素，使用重心坐标或边插值计算插值后的 Varyings（包括 $uv$）和 $invW$。 使用链式法则和预计算的梯度，计算像素的 $\\frac{\\partial u}{\\partial x}$, $\\frac{\\partial v}{\\partial x}$, $\\frac{\\partial u}{\\partial y}$, $\\frac{\\partial v}{\\partial y}$。 调用片段着色器，传入插值后的 Varyings 和 UV 导数。 片段着色器 (fragment)： 接收插值后的 Varyings 和 UV 导数 ($uv_ddx$, $uv_ddy$)。 调用 Texture::sample(u, v, uv_ddx, uv_ddy) 执行 LOD 计算和三线性过滤，返回颜色。 使用采样结果进行光照计算，输出最终像素颜色。 5. 结论Mipmapping 是现代实时渲染中不可或缺的技术。通过预计算多级分辨率的纹理，并根据屏幕空间变化率智能选择合适的层级进行采样（通常使用三线性过滤），它显著减少纹理混叠现象，提高渲染图像质量，同时通过减少访问高分辨率纹理数据提升性能。","tags":["Rendering","Graphics","Mipmap","Texture Filtering","LOD"],"categories":["Computer Graphics"]},{"title":"Implementing FPS Camera Movement in a 3D Application","path":"/2025/04/24/FPS_Camera_Movement_Guide/","content":"Implementing FPS Camera Movement in a 3D ApplicationFirst-Person Shooter (FPS) camera movement is a core mechanic in many 3D applications, providing an immersive perspective where the camera mimics the viewpoint of a character. This guide details how to implement an FPS-style camera based on the provided codebase changes, which use C++, SDL, and a custom 3D rendering engine. The implementation covers mouse-based look controls, keyboard-based movement, and integration with a scene management system. Overview of ChangesThe provided diff modifies several files to enable FPS-style camera controls. Key changes include: Camera Class Overhaul (camera.h, camera.cpp): The Camera class now uses yaw and pitch for orientation instead of a target-based look-at system, enabling smoother FPS-style mouse look. New methods handle keyboard and mouse input for movement and rotation. SDL Application Enhancements (sdl_app.h, sdl_app.cpp): Input handling now tracks keyboard states and mouse motion, with a toggle for mouse look mode (using the Escape key). Scene Loading (scene.yaml, scene.cpp): The scene configuration supports yaw and pitch for camera initialization, replacing the target-based setup. Math Utilities (vector.h, quaternion.h, transform.cpp): Optimizations like lengthSq() and improved numerical stability for vector and quaternion operations. Miscellaneous: CMakeLists.txt: Switches ImGui to a static library. blinn_phong_shader.cpp, model.cpp: Minor optimizations using lengthSq() for performance. resource_manager.cpp, main.cpp: Minor cleanup and logging improvements. This guide focuses on the FPS camera implementation, explaining the core components and how they integrate. Step-by-Step Implementation1. Camera Class DesignThe Camera class (include/core/camera.h, src/core/camera.cpp) is the heart of the FPS camera system. It manages position, orientation (via yaw and pitch), and projection matrices for rendering. Key Features Constructor: Initializes the camera with a position, yaw, and pitch, defaulting to a forward-facing view (yaw &#x3D; -90°, pitch &#x3D; 0°).123456Camera::Camera(const vec3f&amp; position, float initialYaw, float initialPitch) : m_yaw(initialYaw), m_pitch(initialPitch) &#123; m_transform.position = position; m_projMatrix = mat4::identity(); updateCameraVectors();&#125; Orientation: Uses yaw (Y-axis rotation) and pitch (X-axis rotation) to compute a quaternion-based rotation, avoiding gimbal lock compared to Euler angles. Movement: Supports keyboard-driven movement (WASD, Space, Ctrl) and mouse-driven look controls. View Matrix: Computed using a stable look-at construction based on the camera’s forward vector. Orientation and RotationThe camera’s orientation is defined by: Yaw: Rotation around the world’s Y-axis (up). Pitch: Rotation around the camera’s local X-axis (right). The updateRotationAndVectors method computes the rotation quaternion: 1234567void Camera::updateRotationAndVectors() &#123; quat yawQuat = quat::fromAxisAngle(m_worldUp, m_yaw * Q_DEG2RAD); vec3f localRight = vec3f&#123;1.0f, 0.0f, 0.0f&#125;; quat pitchQuat = quat::fromAxisAngle(localRight, m_pitch * Q_DEG2RAD); m_transform.rotation = yawQuat * pitchQuat; m_transform.rotation.normalize();&#125; Yaw is applied first (global rotation), then pitch (local rotation), ensuring intuitive FPS controls. The rotation is normalized to prevent numerical drift. The view matrix is updated in updateViewMatrix using the camera’s forward, right, and up vectors, derived from the rotation: 123456789101112131415161718void Camera::updateViewMatrix() &#123; vec3f position = m_transform.position; vec3f forward = getForward(); vec3f targetPoint = position + forward; vec3f actualForward = (targetPoint - position).normalized(); vec3f actualRight = actualForward.cross(m_worldUp).normalized(); if (actualRight.lengthSq() &lt; 1e-6f) &#123; quat yawQuatOnly = quat::fromAxisAngle(m_worldUp, m_yaw * Q_DEG2RAD); actualRight = yawQuatOnly * vec3f&#123;1.0f, 0.0f, 0.0f&#125;; &#125; vec3f actualUp = actualRight.cross(actualForward).normalized(); mat4 rotation = mat4::identity(); rotation.m[0][0] = actualRight.x; rotation.m[0][1] = actualRight.y; rotation.m[0][2] = actualRight.z; rotation.m[1][0] = actualUp.x; rotation.m[1][1] = actualUp.y; rotation.m[1][2] = actualUp.z; rotation.m[2][0] = -actualForward.x; rotation.m[2][1] = -actualForward.y; rotation.m[2][2] = -actualForward.z; mat4 translation = mat4::translation(-position.x, -position.y, -position.z); m_viewMatrix = rotation * translation;&#125; This handles edge cases (e.g., looking straight up&#x2F;down) to prevent gimbal lock or instability. Mouse LookMouse movement adjusts yaw and pitch: 12345678910111213void Camera::processMouseMovement(float xoffset, float yoffset, float sensitivity, bool constrainPitch) &#123; xoffset *= sensitivity; yoffset *= sensitivity; m_yaw += xoffset; m_pitch += yoffset; m_yaw = fmod(m_yaw, 360.0f); if (m_yaw &lt; 0.0f) m_yaw += 360.0f; if (constrainPitch) &#123; m_pitch = std::clamp(m_pitch, -89.0f, 89.0f); &#125; updateRotationAndVectors(); updateViewMatrix();&#125; Sensitivity: Scales mouse input for smoother control. Pitch Constraint: Limits pitch to ±89° to prevent flipping at the poles. Yaw Wrapping: Keeps yaw in [0, 360°) for continuity. Keyboard MovementKeyboard input moves the camera along its forward, right, and world-up axes: 1234567891011void Camera::processKeyboardMovement(const vec3f&amp; direction, float deltaTime, float speed) &#123; float velocity = speed * deltaTime; vec3f moveAmount = &#123;0.0f, 0.0f, 0.0f&#125;; vec3f currentForward = getForward(); vec3f horizontalRight = -m_worldUp.cross(currentForward).normalized(); moveAmount = moveAmount + currentForward * direction.z * velocity; moveAmount = moveAmount + horizontalRight * direction.x * velocity; moveAmount = moveAmount + m_worldUp * direction.y * velocity; m_transform.position = m_transform.position + moveAmount; updateViewMatrix();&#125; Direction: A vector where x is strafe (left&#x2F;right), y is vertical (up&#x2F;down), and z is forward&#x2F;backward. Delta Time: Ensures frame-rate-independent movement. Speed: Controls movement speed (default: 5 units&#x2F;second). 2. Input Handling in SDLAppThe SDLApp class (include/core/sdl_app.h, src/core/sdl_app.cpp) processes user input and updates the camera. Keyboard InputKeyboard state is tracked using an std::unordered_set for pressed keys: 1std::unordered_set&lt;SDL_Scancode&gt; keysPressed; The handleEvents method updates this set: 12345678910111213141516if (!io.WantCaptureKeyboard) &#123; switch (event.type) &#123; case SDL_KEYDOWN: if (event.key.repeat == 0) &#123; keysPressed.insert(event.key.keysym.scancode); if (event.key.keysym.scancode == SDL_SCANCODE_ESCAPE) &#123; mouseLookActive = !mouseLookActive; std::cout &lt;&lt; &quot;Escape Toggle: mouseLookActive = &quot; &lt;&lt; mouseLookActive &lt;&lt; std::endl; &#125; &#125; break; case SDL_KEYUP: keysPressed.erase(event.key.keysym.scancode); break; &#125;&#125; ImGui Integration: Input is ignored if ImGui (UI) wants keyboard focus. Escape Key: Toggles mouse look mode. The processInput method maps keys to movement: 123456789101112131415void SDLApp::processInput(float dt) &#123; vec3f moveDir = &#123;0.0f, 0.0f, 0.0f&#125;; if (keysPressed.count(SDL_SCANCODE_W)) moveDir.z += 1.0f; // Forward if (keysPressed.count(SDL_SCANCODE_S)) moveDir.z -= 1.0f; // Backward if (keysPressed.count(SDL_SCANCODE_A)) moveDir.x -= 1.0f; // Left if (keysPressed.count(SDL_SCANCODE_D)) moveDir.x += 1.0f; // Right if (keysPressed.count(SDL_SCANCODE_SPACE)) moveDir.y += 1.0f; // Up if (keysPressed.count(SDL_SCANCODE_LCTRL) || keysPressed.count(SDL_SCANCODE_RCTRL)) moveDir.y -= 1.0f; // Down if (moveDir.lengthSq() &gt; 1.0f) &#123; moveDir.normalize(); &#125; if (moveDir.x != 0.0f || moveDir.y != 0.0f || moveDir.z != 0.0f) &#123; scene.getCamera().processKeyboardMovement(moveDir, dt, cameraMoveSpeed); &#125;&#125; Normalization: Ensures diagonal movement (e.g., W+A) doesn’t exceed the intended speed. WASD Controls: Standard FPS movement (W: forward, S: backward, A: strafe left, D: strafe right). Vertical Movement: Space (up) and Ctrl (down) allow free-fly movement, typical in debug or creative modes. Mouse InputMouse look is enabled when mouseLookActive is true, using SDL’s relative mouse mode: 1234567891011if (mouseLookActive) &#123; int mouseXRel, mouseYRel; SDL_GetRelativeMouseState(&amp;mouseXRel, &amp;mouseYRel); if (mouseXRel != 0 || mouseYRel != 0) &#123; scene.getCamera().processMouseMovement( -static_cast&lt;float&gt;(mouseXRel), -static_cast&lt;float&gt;(mouseYRel), cameraLookSensitivity ); &#125;&#125; Relative Mouse Mode: Captures mouse movement without cursor bounds, ideal for FPS controls. Sensitivity: Adjustable via cameraLookSensitivity (default: 0.1). Cursor Visibility: Hidden when mouse look is active, shown otherwise. The handleEvents method toggles mouse mode and cursor visibility: 1234567bool shouldBeRelative = mouseLookActive &amp;&amp; !imguiCapturedMouseThisPoll;bool currentRelativeState = SDL_GetRelativeMouseMode();if (shouldBeRelative != currentRelativeState) &#123; if (SDL_SetRelativeMouseMode(shouldBeRelative ? SDL_TRUE : SDL_FALSE) == 0) &#123; SDL_ShowCursor(shouldBeRelative ? SDL_DISABLE : SDL_ENABLE); &#125;&#125; ImGui Compatibility: Mouse look is disabled if ImGui captures the mouse (e.g., for UI interaction). 3. Scene ConfigurationThe scene file (scenes/scene.yaml) initializes the camera: 123456789camera: position: [0, 1, 3] yaw: 0.0 pitch: 0.0 width: 800 height: 800 fov: 45.0 near: 0.1 far: 100.0 Position: Starting point (x, y, z). Yaw&#x2F;Pitch: Initial orientation. Perspective: Field of view (FOV), aspect ratio, and clipping planes. The Scene class (src/core/scene.cpp) loads these settings: 12345678910111213141516if (cameraNode) &#123; vec3f position = &#123;0.0f, 0.0f, 5.0f&#125;; float yaw = -90.0f; float pitch = 0.0f; if (cameraNode[&quot;position&quot;]) position = cameraNode[&quot;position&quot;].as&lt;std::vector&lt;float&gt;&gt;(); if (cameraNode[&quot;yaw&quot;]) yaw = cameraNode[&quot;yaw&quot;].as&lt;float&gt;(); if (cameraNode[&quot;pitch&quot;]) pitch = cameraNode[&quot;pitch&quot;].as&lt;float&gt;(); camera.setPosition(position); camera.setPitchYaw(pitch, yaw); float fov = 60.0f, aspect = 1.0f, near = 0.1f, far = 100.0f; if (cameraNode[&quot;fov&quot;]) fov = cameraNode[&quot;fov&quot;].as&lt;float&gt;(); if (cameraNode[&quot;width&quot;] &amp;&amp; cameraNode[&quot;height&quot;] &amp;&amp; cameraNode[&quot;height&quot;].as&lt;float&gt;() != 0) &#123; aspect = cameraNode[&quot;width&quot;].as&lt;float&gt;() / cameraNode[&quot;height&quot;].as&lt;float&gt;(); &#125; camera.setPerspective(fov, aspect, near, far);&#125; Defaults: Provides fallback values if YAML fields are missing. Flexible Aspect Ratio: Supports width&#x2F;height or direct aspect ratio. 4. ImGui IntegrationThe ImGui interface (sdl_app.cpp) displays camera properties and controls: 12345678910111213141516ImGui::Begin(&quot;Inspector&quot;);if (ImGui::CollapsingHeader(&quot;Camera&quot;, ImGuiTreeNodeFlags_DefaultOpen)) &#123; vec3f position = scene.getCamera().getPosition(); vec3f rotation = scene.getCamera().getTransform().rotation.toEulerAnglesZYX(); ImGui::InputFloat3(&quot;Position##Cam&quot;, &amp;position.x, &quot;%.2f&quot;, ImGuiInputTextFlags_ReadOnly); ImGui::InputFloat3(&quot;Rotation##Cam&quot;, &amp;rotation.x, &quot;%.2f&quot;, ImGuiInputTextFlags_ReadOnly); ImGui::DragFloat(&quot;Move Speed&quot;, &amp;cameraMoveSpeed, 0.1f, 0.1f, 100.0f); ImGui::DragFloat(&quot;Look Sensitivity&quot;, &amp;cameraLookSensitivity, 0.01f, 0.01f, 1.0f); bool mouseLookStatus = mouseLookActive; if (ImGui::Checkbox(&quot;Mouse Look Active (Esc)&quot;, &amp;mouseLookStatus)) &#123; mouseLookActive = mouseLookStatus; SDL_SetRelativeMouseMode(mouseLookActive ? SDL_TRUE : SDL_FALSE); SDL_ShowCursor(mouseLookActive ? SDL_DISABLE : SDL_ENABLE); &#125;&#125;ImGui::End(); Read-Only Display: Shows position and rotation for debugging. Adjustable Parameters: Allows tweaking movement speed and look sensitivity. Mouse Look Toggle: Mirrors the Escape key functionality. 5. Math OptimizationsThe math library (vector.h, quaternion.h, transform.cpp) supports the camera with: Vector3: Added lengthSq() for faster length checks without square roots:1float lengthSq() const &#123; return x * x + y * y + z * z; &#125; Quaternion: Improved numerical stability in toAxisAngle and toEulerAnglesZYX:1if (axis.lengthSq() &lt; 1e-6f) axis = vec3f(0.0f, 0.0f, 1.0f); Transform: Enhanced lookAt with robust handling of edge cases (e.g., parallel vectors). These optimizations reduce computational overhead and improve stability for camera calculations. Integration with the ApplicationThe FPS camera is integrated into the main loop in SDLApp::run (sdl_app.cpp): 123456789while (!quit) &#123; updateFPS(); processInput(deltaTime); update(deltaTime); renderFrame(); updateTextureFromFramebuffer(); renderImGui(); render();&#125; Input Processing: processInput updates the camera based on keyboard and mouse input. Rendering: The camera’s view and projection matrices are passed to the renderer for scene rendering. ImGui: Provides real-time feedback and control. Best Practices and Tips Frame-Rate Independence: Always scale movement by deltaTime to ensure consistent speed across hardware. Numerical Stability: Use lengthSq() for comparisons and normalize quaternions to prevent drift. User Comfort: Constrain pitch to avoid disorienting flips and provide adjustable sensitivity. ImGui Integration: Ensure input is disabled when ImGui is active to prevent conflicts. Debugging: Use ImGui to display camera state and log warnings for YAML parsing errors. Potential Enhancements Collision Detection: Prevent the camera from moving through objects. Smoothing: Add interpolation for smoother mouse look. Configurable Keybindings: Allow users to remap WASD controls. Camera Shake: Implement for visual effects (e.g., explosions). Field of View Adjustment: Add dynamic FOV for sprinting or zooming. ConclusionThis FPS camera implementation provides a robust foundation for 3D applications, with smooth mouse look, intuitive keyboard movement, and seamless integration with SDL and ImGui. By leveraging yaw&#x2F;pitch orientation, quaternion rotations, and optimized math utilities, the system ensures performance and stability. The provided codebase is extensible, making it easy to add features like collision detection or advanced input handling. For further details, refer to the source files (camera.h/cpp, sdl_app.h/cpp, scene.cpp) and experiment with the scene configuration (scene.yaml) to customize the camera’s behavior.","tags":["C++","FPS","Graphics","Camera","3D","SDL"],"categories":["Computer Graphics"]},{"title":"优化软渲染器：线程池设计与性能提升","path":"/2025/04/18/ThreadPool/","content":"在开发软渲染器时，性能优化是核心目标。单线程渲染受限于 CPU 单核性能，特别是在高分辨率或复杂场景下，帧率（FPS）往往较低。通过引入线程池（ThreadPool）并对关键模块进行多线程优化，我们成功将软渲染器的平均 FPS 从约 30 提升至 145（最大线程数 32，分辨率 800x800）。本文将详细介绍线程池的设计思路、实现方法，以及这些优化如何显著提升渲染性能。 背景软渲染器是一个在 CPU 上运行的图形渲染系统，涉及顶点处理、光栅化、像素着色等计算密集型阶段。单线程实现中，平均 FPS 约为 30，难以满足实时渲染需求。性能瓶颈主要集中在： 像素填充：逐像素的颜色和深度测试耗时长。 顶点处理与光栅化：复杂模型的三角形处理计算量大。 纹理更新：将帧缓冲区数据传输到 SDL 纹理的单线程操作效率低。 为解决这些问题，我们设计并实现了线程池，将计算密集型任务并行化，充分利用多核 CPU 的性能。以下将重点阐述线程池的设计与实现，以及其在软渲染器中的应用。 线程池的设计思路与实现方法线程池是一种高效的多线程任务管理机制，通过复用固定数量的线程执行任务，避免频繁创建和销毁线程的开销。我们的线程池（ThreadPool 类）专为软渲染器的并行需求设计，核心目标是高效分配任务、确保线程安全，并提供任务完成同步机制。以下是线程池的详细设计思路和实现方法。 1. 设计思路线程池的设计围绕以下几个关键点： 任务队列：使用线程安全的队列存储待执行任务（std::function&lt;void()&gt;），支持动态添加任务。 线程管理：创建固定数量的线程（通常基于硬件并发性），每个线程持续从队列中获取任务执行。 线程同步：通过互斥锁（std::mutex）和条件变量（std::condition_variable）实现任务分配和任务完成通知的线程安全。 任务完成等待：提供机制让主线程等待所有任务完成，确保渲染流水线的同步。 异常安全：确保线程池在异常情况下（如停止时添加任务）能正确处理。 性能优化：最小化锁竞争和上下文切换，提升任务分配和执行效率。 2. 实现方法ThreadPool 类的实现基于 C++11 的线程库（std::thread、std::mutex 等），代码结构清晰且高效。以下是核心组件的详细实现说明。 a. 类定义与成员线程池的核心数据结构包括： 任务队列：std::queue&lt;std::function&lt;void()&gt;&gt; tasks 存储待执行任务。 工作线程：std::vector&lt;std::thread&gt; workers 管理线程池中的线程。 同步机制： std::mutex queueMutex：保护任务队列的访问。 std::condition_variable condition：通知线程有新任务或停止信号。 std::condition_variable completionCondition：通知主线程所有任务完成。 状态变量： bool stop：控制线程池的停止状态。 std::atomic&lt;uint32_t&gt; activeTasks：跟踪当前正在执行的任务数。 头文件定义： 1234567891011121314151617class ThreadPool &#123;public: ThreadPool(uint32_t numThreads); ~ThreadPool(); void enqueue(std::function&lt;void()&gt; task); void waitForCompletion(); uint32_t getNumThreads() const &#123; return workers.size(); &#125;private: std::vector&lt;std::thread&gt; workers; std::queue&lt;std::function&lt;void()&gt;&gt; tasks; std::mutex queueMutex; std::condition_variable condition; std::condition_variable completionCondition; bool stop; std::atomic&lt;uint32_t&gt; activeTasks; void workerThread();&#125;; b. 构造函数构造函数初始化线程池，创建指定数量的工作线程（numThreads），并将每个线程绑定到 workerThread 方法。 实现： 12345ThreadPool::ThreadPool(uint32_t numThreads) : stop(false), activeTasks(0) &#123; for (uint32_t i = 0; i &lt; numThreads; ++i) &#123; workers.emplace_back(&amp;ThreadPool::workerThread, this); &#125;&#125; 线程数选择： numThreads 通常设置为 std::thread::hardware_concurrency() - 1，以保留一个核心给主线程和其他系统任务。测试中最大线程数为 32。 线程创建：使用 emplace_back 直接构造 std::thread 对象，绑定到 workerThread 方法，减少拷贝开销。 c. 析构函数析构函数负责安全停止线程池，确保所有线程正确退出。 实现： 12345678910ThreadPool::~ThreadPool() &#123; &#123; std::unique_lock&lt;std::mutex&gt; lock(queueMutex); stop = true; &#125; condition.notify_all(); for (std::thread&amp; worker : workers) &#123; worker.join(); &#125;&#125; 停止信号：设置 stop = true，通知所有线程退出。 通知线程：调用 condition.notify_all() 唤醒所有等待任务的线程。 线程回收：通过 worker.join() 等待每个线程退出，确保资源正确释放。 d. 任务入队enqueue 方法将任务添加到任务队列，并通知一个空闲线程执行。 实现： 12345678910void ThreadPool::enqueue(std::function&lt;void()&gt; task) &#123; &#123; std::unique_lock&lt;std::mutex&gt; lock(queueMutex); if (stop) &#123; throw std::runtime_error(&quot;Cannot enqueue task: ThreadPool is stopped&quot;); &#125; tasks.emplace(task); &#125; condition.notify_one();&#125; 线程安全：使用 std::unique_lock 保护任务队列，防止多线程同时修改。 异常检查：如果线程池已停止（stop == true），抛出异常以避免无效操作。 高效通知：condition.notify_one() 只唤醒一个等待的线程，减少不必要的上下文切换。 e. 工作线程workerThread 是每个线程执行的循环，持续从任务队列中获取并执行任务。 实现： 1234567891011121314151617181920212223void ThreadPool::workerThread() &#123; while (true) &#123; std::function&lt;void()&gt; task; &#123; std::unique_lock&lt;std::mutex&gt; lock(queueMutex); condition.wait(lock, [this] &#123; return stop || !tasks.empty(); &#125;); if (stop &amp;&amp; tasks.empty()) &#123; return; &#125; task = std::move(tasks.front()); tasks.pop(); &#125; activeTasks++; task(); &#123; std::unique_lock&lt;std::mutex&gt; lock(queueMutex); activeTasks--; if (activeTasks == 0 &amp;&amp; tasks.empty()) &#123; completionCondition.notify_all(); &#125; &#125; &#125;&#125; 任务获取： 使用 condition.wait 等待任务或停止信号，条件为 stop || !tasks.empty()。 如果 stop == true 且队列为空，线程退出。 使用 std::move 高效转移任务，减少拷贝开销。 任务执行： 在临界区外执行任务（task()），避免持有锁时间过长。 通过 activeTasks 跟踪正在执行的任务数。 完成通知： 任务完成后，减少 activeTasks 计数。 如果 activeTasks == 0 且队列为空，通知主线程所有任务完成。 f. 任务完成等待waitForCompletion 方法让主线程等待所有任务完成。 实现： 1234void ThreadPool::waitForCompletion() &#123; std::unique_lock&lt;std::mutex&gt; lock(queueMutex); completionCondition.wait(lock, [this] &#123; return activeTasks == 0 &amp;&amp; tasks.empty(); &#125;);&#125; 等待条件：等待 activeTasks == 0（无任务在执行）且 tasks.empty()（队列为空）。 高效同步：使用 completionCondition 避免主线程忙等待，提升性能。 3. 线程池在软渲染器中的应用线程池被集成到渲染流水线的多个模块，以并行化计算密集型任务。以下是主要应用场景的详细说明： a. Framebuffer 的多线程优化Framebuffer 负责像素填充、深度测试和帧缓冲区翻转。我们引入了以下优化： 像素锁机制：为避免多线程写入同一像素的竞争，设计了一个固定大小的锁池（std::vector&lt;std::mutex&gt;，大小为 LOCK_POOL_SIZE = 2047）。通过哈希函数 getLockIndex(x, y) 将像素坐标映射到锁池中的互斥锁，实现细粒度同步。 垂直翻转并行化： flipVertical 方法将帧缓冲区的行分成若干组（rowsPerThread），每个线程处理一部分行，通过线程池并行执行。 任务分配：线程池将翻转任务分解为小块（每线程处理 rowsPerThread 行），确保负载均衡。 代码示例： 1234567891011121314151617181920void Framebuffer::flipVertical() &#123;#ifdef MultiThreading uint32_t numThreads = threadPool.getNumThreads(); numThreads = std::max(1u, numThreads); int rowsPerThread = (height / 2 + numThreads - 1) / numThreads; for (int startY = 0; startY &lt; height / 2; startY += rowsPerThread) &#123; int endY = std::min(startY + rowsPerThread, height / 2); threadPool.enqueue([this, startY, endY]() &#123; for (int y = startY; y &lt; endY; ++y) &#123; for (int x = 0; x &lt; width; ++x) &#123; std::swap(pixels[y * width + x], pixels[(height - 1 - y) * width + x]); &#125; &#125; &#125;); &#125; threadPool.waitForCompletion();#else // 单线程实现#endif&#125; b. Renderer 的多线程优化 Renderer 负责顶点处理、三角形光栅化和绘制。我们将三角形处理并行化： 三角形分配：将模型的三角形（numFaces）分成若干组（facesPerThread），通过线程池分配给多个线程。 任务划分：每个任务处理一定范围的三角形（startFace 到 endFace），包括顶点处理、透视除法和光栅化。 同步：通过 waitForCompletion 确保所有三角形处理完成后再进入下一阶段。 代码示例： 1234567891011121314151617181920void Renderer::drawModel(const Model&amp; model, const Transform&amp; transform, const Material&amp; material) &#123; int numFaces = static_cast&lt;int&gt;(model.numFaces());#ifdef MultiThreading int maxThreads = threadPool.getNumThreads(); int numThreads = std::max(1, std::min(maxThreads, numFaces)); int facesPerThread = (numFaces + numThreads - 1) / numThreads; for (int startFace = 0; startFace &lt; numFaces; startFace += facesPerThread) &#123; int endFace = std::min(startFace + facesPerThread, numFaces); threadPool.enqueue([this, &amp;model, &amp;material, startFace, endFace]() &#123; for (int i = startFace; i &lt; endFace; ++i) &#123; // 三角形处理逻辑 drawTriangle(screenVertices[0], screenVertices[1], screenVertices[2], material); &#125; &#125;); &#125; threadPool.waitForCompletion();#else // 单线程实现#endif&#125; c. SDLApp 的纹理更新并行化SDLApp 负责将帧缓冲区数据传输到 SDL 纹理。我们将 updateTextureFromFramebuffer 并行化： 行分配：将帧缓冲区的行分成若干组（rowsPerThread），每个线程处理一部分行。 颜色转换：线程池并行执行颜色值从浮点（vec3f）到 Uint8 的转换。 代码示例： 1234567891011121314151617181920212223242526void SDLApp::updateTextureFromFramebuffer(const Framebuffer&amp; framebuffer) &#123; // ... SDL 纹理锁定 ...#ifdef MultiThreading uint32_t numThreads = threadPool.getNumThreads(); numThreads = std::max(1u, numThreads); int rowsPerThread = (height + numThreads - 1) / numThreads; for (int startY = 0; startY &lt; height; startY += rowsPerThread) &#123; int endY = std::min(startY + rowsPerThread, height); threadPool.enqueue([this, &amp;framebuffer, dstPixels, pitch, &amp;pixels, startY, endY]() &#123; for (int y = startY; y &lt; endY; ++y) &#123; for (int x = 0; x &lt; width; ++x) &#123; const vec3f&amp; color = pixels[y * width + x]; Uint8* dstPixel = dstPixels + y * pitch + x * 3; dstPixel[0] = static_cast&lt;Uint8&gt;(std::round(color.x * 255.0f)); dstPixel[1] = static_cast&lt;Uint8&gt;(std::round(color.y * 255.0f)); dstPixel[2] = static_cast&lt;Uint8&gt;(std::round(color.z * 255.0f)); &#125; &#125; &#125;); &#125; threadPool.waitForCompletion();#else // 单线程实现#endif // ... SDL 纹理解锁 ...&#125; 4. 实现细节与优化 任务粒度：任务被划分为较小单元（例如每线程处理若干行或三角形），通过向上取整（ceiling division）确保负载均衡。 锁优化：任务队列操作使用 std::unique_lock 最小化锁持有时间，任务执行在临界区外进行，降低锁竞争。 条件变量：condition 和 completionCondition 分别用于任务分配和完成通知，避免忙等待。 异常处理：在 enqueue 中检查 stop 状态，防止向已停止的线程池添加任务。 条件编译：通过 #ifdef MultiThreading 支持单线程和多线程模式，便于调试和兼容性测试。 性能提升分析通过线程池和多线程优化，软渲染器的平均 FPS 从 30 提升至约 145（分辨率 800x800，最大线程数 32）。以下是性能提升的关键因素： 并行化计算密集型任务： 顶点处理和光栅化通过线程池并行执行，显著减少了 Renderer::drawModel 的耗时。 帧缓冲区翻转和纹理更新并行化，降低了 Framebuffer::flipVertical 和 SDLApp::updateTextureFromFramebuffer 的延迟。 细粒度锁机制： Framebuffer 的锁池（pixelLocks）通过哈希映射减少锁竞争，确保像素写入的线程安全。 负载均衡： 任务按行或三角形均匀分配，最大化利用多核 CPU。 线程复用： 线程池避免频繁创建和销毁线程，减少上下文切换开销。 性能数据 测试环境：分辨率 800x800，32 线程，复杂场景（多个模型、灯光和纹理）。 单线程 FPS：约 30。 多线程 FPS：约 145（提升约 4.83 倍）。 瓶颈分析：多线程模式下，SDL 的 SDL_RenderPresent 成为新瓶颈，受限于其单线程设计。 结论通过设计高效的线程池并优化 Framebuffer、Renderer 和 SDLApp，我们将软渲染器的平均 FPS 从 30 提升至 145，性能提升约 4.83 倍。线程池通过任务队列、线程复用、细粒度同步和负载均衡，充分发挥了多核 CPU 的潜力。这些优化展示了多线程编程在软渲染器中的价值，同时为未来改进（如 GPU 加速）奠定了基础。 欢迎讨论线程池实现细节或软渲染器的进一步优化！","tags":["C++","软渲染器","线程池","多线程","性能优化"],"categories":["技术"]},{"title":"使用 SDL 窗口化实时渲染：设计 Scene 和 SDLApp 组件","path":"/2025/04/16/SDL/","content":"使用 SDL 窗口化实时渲染：设计 Scene 和 SDLApp 组件在开发实时渲染应用时，SDL（Simple DirectMedia Layer）是一个轻量且跨平台的库，广泛用于创建窗口、处理输入和显示渲染结果。本文分享了一个基于 SDL 的软件光栅化渲染器的设计与实现，重点介绍如何通过模块化的组件（如 Scene 和 SDLApp）实现窗口化实时渲染。我们将从架构设计、组件实现、场景加载到具体代码细节逐步展开，适合对图形学、游戏开发或系统设计感兴趣的开发者参考。 背景与目标目标是构建一个软件光栅化渲染器，支持加载 3D 模型（OBJ 格式）、应用 Blinn-Phong 着色、处理光照和纹理，并通过 SDL 窗口实时显示渲染结果。核心需求包括： 模块化设计：将渲染逻辑与窗口&#x2F;输入处理分离。 灵活的场景管理：支持通过配置文件（如 YAML）动态加载场景。 实时交互：实现流畅的渲染循环，支持动画和用户输入。 跨平台兼容：利用 SDL 确保代码在 Windows、Linux 等平台上运行。 最终实现了一个渲染器，能够加载非洲头模型（african_head.obj）及其纹理，应用旋转动画，并通过 SDL 窗口显示，帧率信息实时更新在窗口标题栏。 架构设计为了实现上述目标，我们设计了以下核心组件： SDLApp：负责 SDL 窗口管理、事件处理和渲染循环。 Scene：管理渲染相关的数据（如模型、材质、光照）和逻辑。 Renderer：执行光栅化渲染，将场景绘制到帧缓冲区。 Framebuffer：存储渲染结果的像素数据，供 SDL 显示。 Camera、Light、Model、Material：场景的子组件，定义视角、光照、几何和材质。 架构图如下： 123456789101112131415161718192021+------------------+| SDLApp || - Window || - Renderer || - Texture || - Event Loop || - FPS Counter |+------------------+ | v+------------------+| Scene || - Framebuffer || - Renderer || - Camera || - Lights || - Objects || - Model || - Material || - Transform |+------------------+ 设计原则 关注点分离：SDLApp 只处理窗口和输入，Scene 专注于渲染逻辑。 松耦合：通过回调机制连接 SDLApp 和 Scene，避免直接依赖。 可扩展性：使用 YAML 配置文件加载场景，支持动态修改。 简洁性：保持接口清晰，代码易于维护和扩展。 组件设计与实现1. SDLApp：窗口与渲染循环SDLApp 是应用的入口，负责初始化 SDL、创建窗口、管理渲染循环和处理输入事件。其核心职责包括： 初始化 SDL 窗口和渲染器。 创建流式纹理（SDL_Texture）用于显示帧缓冲区。 运行主循环，处理事件、更新 FPS 和显示渲染结果。 通过回调与 Scene 交互，获取渲染后的帧缓冲区。 关键代码SDLApp 的头文件定义如下： 1234567891011121314151617181920212223242526class SDLApp &#123;public: SDLApp(int width, int height, const std::string&amp; title); ~SDLApp(); bool initialize(); void run(const std::function&lt;const Framebuffer&amp;(float)&gt;&amp; renderCallback); void updateTextureFromFramebuffer(const Framebuffer&amp; framebuffer); SDL_Renderer* getRenderer();private: int width; int height; std::string title; SDL_Window* window; SDL_Renderer* sdlRenderer; SDL_Texture* framebufferTexture; bool quit; float deltaTime; int frameCount; float fps; Uint32 lastFrameTime; Uint32 fpsUpdateTimer; void handleEvents(); void updateFPS();&#125;; run 方法实现了主循环，调用渲染回调获取帧缓冲区并更新显示： 1234567891011void SDLApp::run(const std::function&lt;Framebuffer&amp;(float)&gt;&amp; renderCallback) &#123; while (!quit) &#123; handleEvents(); updateFPS(); Framebuffer&amp; framebuffer = renderCallback(deltaTime); updateTextureFromFramebuffer(framebuffer); SDL_RenderClear(sdlRenderer); SDL_RenderCopy(sdlRenderer, nullptr, nullptr, nullptr); SDL_RenderPresent(sdlRenderer); &#125;&#125; updateTextureFromFramebuffer 将帧缓冲区的像素数据复制到 SDL 纹理： 123456789101112131415161718192021void SDLApp::updateTextureFromFramebuffer(Framebuffer&amp; framebuffer) &#123; void* texturePixels; int pitch; if (SDL_LockTexture(framebufferTexture, NULL, &amp;texturePixels, &amp;pitch) != 0) &#123; std::cerr &lt;&lt; &quot;SDL_LockTexture failed: &quot; &lt;&lt; SDL_GetError() &lt;&lt; std::endl; return; &#125; Uint8* dstPixels = static_cast&lt;Uint8*&gt;(texturePixels); auto&amp; pixels = framebuffer.getPixels(); for (int y = 0; y &lt; height; ++y) &#123; for (int x = 0; x &lt; width; ++x) &#123; int framebufferY = y; const vec3f&amp; color = pixels[framebufferY * width + x]; Uint8* dstPixel = dstPixels + y * pitch + x * 3; dstPixel[0] = static_cast&lt;Uint8&gt;(std::max(0.0f, std::min(255.0f, std::round(color.x * 255.0f)))); dstPixel[1] = static_cast&lt;Uint8&gt;(std::max(0.0f, std::min(255.0f, std::round(color.y * 255.0f)))); dstPixel[2] = static_cast&lt;Uint8&gt;(std::max(0.0f, std::min(255.0f, std::round(color.z * 255.0f)))); &#125; &#125; SDL_UnlockTexture(framebufferTexture);&#125; 设计亮点 回调机制：通过 std::function&lt;Framebuffer&amp;(float)&gt; 解耦 SDLApp 和渲染逻辑，允许任意组件提供帧缓冲区。 事件封装：handleEvents 和 updateFPS 是私有方法，仅在 run 中调用，确保外部无法误用。 FPS 显示：每秒更新窗口标题，显示实时帧率，方便性能监控。 2. Scene：场景管理与渲染Scene 负责管理渲染相关的数据和逻辑，包括帧缓冲区、渲染器、相机、光源和场景对象。它通过 YAML 配置文件加载场景，支持动态模型、材质和动画。 数据结构Scene 使用 SceneObject 结构体表示场景中的对象： 123456789struct SceneObject &#123; Model model; Material material; Transform transform; struct Animation &#123; enum class Type &#123; None, RotateY &#125; type = Type::None; float speed = 0.0f; &#125; animation;&#125;; Scene 类定义如下： 12345678910111213141516class Scene &#123;public: Scene(int width, int height); bool loadFromYAML(const std::string&amp; filename); void update(float deltaTime); void render(); Framebuffer&amp; getFramebuffer();private: Framebuffer framebuffer; Renderer renderer; Camera camera; std::vector&lt;Light&gt; lights; std::vector&lt;SceneObject&gt; objects; void initializeDefaultScene();&#125;; YAML 场景加载场景通过 YAML 文件定义，包含相机、光源和对象。例如： 12345678910111213141516171819202122232425262728camera: position: [0, 1, 3] target: [0, 0, 0] up: [0, 1, 0] fov: 45.0 near: 0.1 far: 100.0lights: - type: directional direction: [0.707, 0.0, -0.707] color: [1.0, 1.0, 1.0] intensity: 1.0objects: - name: head model: resources/obj/african_head.obj material: shader: blinn_phong diffuse_texture: resources/diffuse/african_head_diffuse.tga normal_texture: resources/normal_tangent/african_head_nm_tangent.tga specular_texture: resources/spec/african_head_spec.tga transform: position: [0.0, 0.0, 0.0] rotation: [0.0, 0.0, 0.0] scale: [1.0, 1.0, 1.0] animation: type: rotate_y speed: 30.0 loadFromYAML 方法解析 YAML 文件，初始化相机、光源和对象： 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091bool Scene::loadFromYAML(const std::string&amp; filename) &#123; try &#123; YAML::Node config = YAML::LoadFile(filename); // Load camera auto cameraNode = config[&quot;camera&quot;]; if (cameraNode) &#123; vec3f position = cameraNode[&quot;position&quot;].as&lt;std::vector&lt;float&gt;&gt;(); vec3f target = cameraNode[&quot;target&quot;].as&lt;std::vector&lt;float&gt;&gt;(); vec3f up = cameraNode[&quot;up&quot;].as&lt;std::vector&lt;float&gt;&gt;(); camera = Camera(position, target, up); camera.setPerspective( cameraNode[&quot;fov&quot;].as&lt;float&gt;(), static_cast&lt;float&gt;(framebuffer.getWidth()) / framebuffer.getHeight(), cameraNode[&quot;near&quot;].as&lt;float&gt;(), cameraNode[&quot;far&quot;].as&lt;float&gt;() ); renderer.setCamera(camera); &#125; // Load lights lights.clear(); auto lightsNode = config[&quot;lights&quot;]; if (lightsNode) &#123; for (const auto&amp; lightNode : lightsNode) &#123; Light light; std::string type = lightNode[&quot;type&quot;].as&lt;std::string&gt;(); if (type == &quot;directional&quot;) light.type = LightType::DIRECTIONAL; else if (type == &quot;point&quot;) light.type = LightType::POINT; else continue; light.color = lightNode[&quot;color&quot;].as&lt;std::vector&lt;float&gt;&gt;(); light.intensity = lightNode[&quot;intensity&quot;].as&lt;float&gt;(); if (light.type == LightType::DIRECTIONAL) &#123; light.direction = vec3f(lightNode[&quot;direction&quot;].as&lt;std::vector&lt;float&gt;&gt;()).normalized(); &#125; else &#123; light.position = lightNode[&quot;position&quot;].as&lt;std::vector&lt;float&gt;&gt;(); &#125; lights.push_back(light); &#125; renderer.setLights(lights); &#125; // Load objects objects.clear(); auto objectsNode = config[&quot;objects&quot;]; if (objectsNode) &#123; for (const auto&amp; objNode : objectsNode) &#123; SceneObject obj; std::string modelPath = objNode[&quot;model&quot;].as&lt;std::string&gt;(); if (!obj.model.loadFromObj(modelPath)) &#123; std::cerr &lt;&lt; &quot;Failed to load model: &quot; &lt;&lt; modelPath &lt;&lt; std::endl; continue; &#125; auto matNode = objNode[&quot;material&quot;]; obj.material = Material(std::make_shared&lt;BlinnPhongShader&gt;()); if (matNode[&quot;diffuse_texture&quot;]) &#123; obj.material.loadDiffuseTexture(matNode[&quot;diffuse_texture&quot;].as&lt;std::string&gt;()); &#125; if (matNode[&quot;normal_texture&quot;]) &#123; obj.material.loadNormalTexture(matNode[&quot;normal_texture&quot;].as&lt;std::string&gt;()); &#125; if (matNode[&quot;specular_texture&quot;]) &#123; obj.material.loadSpecularTexture(matNode[&quot;specular_texture&quot;].as&lt;std::string&gt;()); &#125; auto transformNode = objNode[&quot;transform&quot;]; if (transformNode) &#123; if (transformNode[&quot;position&quot;]) &#123; obj.transform.setPosition(transformNode[&quot;position&quot;].as&lt;std::vector&lt;float&gt;&gt;()); &#125; if (transformNode[&quot;rotation&quot;]) &#123; obj.transform.setRotationEulerZYX(transformNode[&quot;rotation&quot;].as&lt;std::vector&lt;float&gt;&gt;()); &#125; if (transformNode[&quot;scale&quot;]) &#123; obj.transform.setScale(transformNode[&quot;scale&quot;].as&lt;std::vector&lt;float&gt;&gt;()); &#125; &#125; auto animNode = transformNode[&quot;animation&quot;]; if (animNode &amp;&amp; animNode[&quot;type&quot;]) &#123; std::string animType = animNode[&quot;type&quot;].as&lt;std::string&gt;(); if (animType == &quot;rotate_y&quot;) &#123; obj.animation.type = SceneObject::Animation::Type::RotateY; obj.animation.speed = animNode[&quot;speed&quot;].as&lt;float&gt;(); &#125; &#125; objects.push_back(obj); &#125; &#125; return true; &#125; catch (const YAML::Exception&amp; e) &#123; std::cerr &lt;&lt; &quot;Error parsing YAML file: &quot; &lt;&lt; e.what() &lt;&lt; std::endl; initializeDefaultScene(); return false; &#125;&#125; 设计亮点 泛化场景：通过 std::vector 管理任意数量的模型，取代硬编码的特定模型。 动态加载：YAML 文件定义场景，易于修改和扩展，无需更改代码。 动画支持：通过 Animation 结构体实现简单的旋转动画，可扩展到更多类型。 3. 其他组件 Renderer：执行光栅化管线，处理顶点变换、裁剪、光栅化和片段着色。支持 Blinn-Phong 着色模型。 Framebuffer：存储渲染结果的像素和深度数据，支持颜色清除和垂直翻转（适配 SDL 坐标系）。 Camera：提供视角变换矩阵，支持透视投影。 Light：支持方向光和点光，传递给着色器。 Model 和 Material：加载 OBJ 模型和 TGA 纹理，支持法线贴图和镜面贴图。 这些组件与 Scene 紧密协作，共同完成渲染任务。 集成与运行main.cpp 负责初始化和启动应用： 1234567891011121314151617181920int main(int argc, char* argv[]) &#123; const int width = 800; const int height = 800; SDLApp app(width, height, &quot;Software Rasterizer&quot;); if (!app.initialize()) &#123; std::cerr &lt;&lt; &quot;Failed to initialize SDLApp&quot; &lt;&lt; std::endl; return 1; &#125; Scene scene(width, height); if (!scene.loadFromYAML(&quot;scenes/scene.yaml&quot;)) &#123; std::cerr &lt;&lt; &quot;Failed to load scene, using default&quot; &lt;&lt; std::endl; &#125; app.run([&amp;scene](float deltaTime) -&gt; Framebuffer&amp; &#123; scene.update(deltaTime); scene.render(); return scene.getFramebuffer(); &#125;); std::cout &lt;&lt; &quot;Exiting...&quot; &lt;&lt; std::endl; return 0;&#125; 构建与依赖使用 CMake 管理构建，依赖 SDL2 和 yaml-cpp： 12345678910111213cmake_minimum_required(VERSION 3.10)project(SoftRasterizer)set(CMAKE_CXX_STANDARD 17)find_package(SDL2 REQUIRED)add_subdirectory(thirdparty/yaml-cpp)file(GLOB SOURCES &quot;src/*.cpp&quot; &quot;src/core/*.cpp&quot; &quot;src/io/*.cpp&quot; &quot;src/math/*.cpp&quot;)add_executable(SoftRasterizer $&#123;SOURCES&#125;)target_link_libraries(SoftRasterizer PRIVATE SDL2::SDL2 SDL2::SDL2main yaml-cpp)target_include_directories(SoftRasterizer PRIVATE include $&#123;SDL2_INCLUDE_DIRS&#125;)add_custom_command(TARGET SoftRasterizer POST_BUILD COMMAND $&#123;CMAKE_COMMAND&#125; -E copy_directory $&#123;CMAKE_SOURCE_DIR&#125;/resources $&#123;CMAKE_BINARY_DIR&#125;/bin/resources COMMAND $&#123;CMAKE_COMMAND&#125; -E copy_if_different $&#123;CMAKE_SOURCE_DIR&#125;/scenes/scene.yaml $&#123;CMAKE_BINARY_DIR&#125;/bin/scenes/scene.yaml) 实现效果运行后，程序加载 scene.yaml，渲染非洲头模型及其眼睛，应用 Blinn-Phong 着色和纹理。模型以每秒 30 度的速度绕 Y 轴旋转，窗口标题显示实时 FPS。效果如下： 渲染质量：支持法线贴图、镜面高光，视觉效果逼真。 性能：软件光栅化在 800x800 分辨率下流畅运行，FPS ~&#x3D; 30。 总结与展望通过模块化的 SDLApp 和 Scene 设计，我们实现了一个灵活的实时渲染器。SDLApp 封装了窗口和输入逻辑，Scene 通过 YAML 提供动态场景管理，回调机制确保了两者的松耦合。以下是未来可改进的方向： 事件处理：扩展 SDLApp 支持键盘和鼠标输入，实现相机控制。 多场景支持：允许运行时切换 YAML 文件，加载不同场景。 渲染优化：添加 SIMD 指令（如 AVX2）加速光栅化。 更复杂动画：支持关键帧动画或骨骼动画。","tags":["SDL","实时渲染","软件光栅化","场景管理"],"categories":["图形学","游戏开发"]},{"title":"使用四元数和变换类管理对象姿态","path":"/2025/04/15/QUATERNION/","content":"在之前的开发中，我们通过直接传递一个 mat4 modelMatrix 到渲染函数来表示模型的位置、旋转和缩放。虽然可行，但这种方式不够直观，难以单独修改某个变换分量（如只改变旋转），并且在处理旋转时可能遇到欧拉角的固有问题。为了更优雅、健壮地管理对象姿态，我们引入了 Transform 类，并使用四元数 (Quaternion) 来处理旋转。本文将详细介绍四元数的基本原理、与欧拉角的转换关系，以及如何通过 Transform 类来封装这些操作。 详细内容可参考这篇PDF： https://krasjet.github.io/quaternion/quaternion.pdf 1. 旋转的挑战：欧拉角与万向锁我们通常习惯使用欧拉角（如绕 X、Y、Z 轴旋转的角度）来描述旋转，因为它非常直观。然而，欧拉角存在一个著名的问题——万向锁 (Gimbal Lock)。 当按顺序应用三个旋转时（例如，先绕 Z 轴，再绕新 X 轴，最后绕新 Y 轴），中间的旋转（绕 X 轴）可能恰好使得最后一个旋转轴（Y 轴）与第一个旋转轴（Z 轴）重合。这时，无论如何调整第一个和最后一个角度，都只能在同一个平面上旋转，丢失了一个旋转自由度，导致无法实现某些期望的旋转组合。 （示意图：万向锁现象） 2. 更好的选择：四元数 (Quaternion)四元数提供了一种更稳健、更高效的方式来表示三维空间中的旋转。 2.1 基本定义一个四元数 q 可以表示为：q = w + xi + yj + zk其中 w 是实部（标量部分），(x, y, z) 是虚部（向量部分），i, j, k 是虚数单位，满足以下关系： i^2 = j^2 = k^2 = ijk = -1 ij = k, ji = -k jk = i, kj = -i ki = j, ik = -j 在我们的 C++ 实现中，通常用四个浮点数表示： 12345class Quaternion &#123;public: float w, x, y, z; // ... methods ...&#125;; 2.2 轴角表示法任意一个三维旋转都可以表示为绕某个单位向量轴 a = (ax, ay, az) 旋转角度 θ。这可以方便地转换为四元数： w = cos(θ / 2) x = ax * sin(θ / 2) y = ay * sin(θ / 2) z = az * sin(θ / 2) 或者写成 q = (cos(θ/2), sin(θ/2) * a)。 用于表示旋转的四元数必须是单位四元数，即其模长为 1。 2.3 关键运算与推导 模长 (Magnitude):|q| = sqrt(w^2 + x^2 + y^2 + z^2)对于单位四元数，|q| &#x3D; 1。 共轭 (Conjugate):q* = w - xi - yj - zk = (w, -x, -y, -z) 逆 (Inverse):q^-1 = q* / |q|^2对于单位四元数 (旋转四元数)，|q|^2 = 1，因此其逆就是其共轭：q^-1 = q*。 乘法 (Multiplication - 旋转的组合): 四元数乘法不满足交换律 (q1 * q2 != q2 * q1)。它表示旋转的组合，q_total = q2 * q1 表示先应用 q1 旋转，再应用 q2 旋转。 设 q1 = (w1, x1, y1, z1) 和 q2 = (w2, x2, y2, z2)。 q1 * q2 = (w1 + x1i + y1j + z1k) * (w2 + x2i + y2j + z2k) 展开并使用 i, j, k 的关系，得到： w = w1w2 - x1x2 - y1y2 - z1z2 x = w1x2 + x1w2 + y1z2 - z1y2 y = w1y2 - x1z2 + y1w2 + z1x2 z = w1z2 + x1y2 - y1x2 + z1w2 这对应了 Quaternion::operator* 的实现。 向量旋转: 使用单位四元数 q 旋转向量 v，可以通过以下公式计算： v&#39; = q * v * q^-1 这里需要将向量 v = (vx, vy, vz) 表示为一个纯虚四元数 p = (0, vx, vy, vz)。计算过程是先计算 p&#39; = q * p，再计算 v&#39; = p&#39; * q^-1。结果 v&#39; 也是一个纯虚四元数，其虚部就是旋转后的向量。 展开这个公式可以得到一个更直接的计算方法（假设 q 已归一化）： 令 q_vec = (x, y, z) v&#39; = v + 2w * (q_vec × v) + 2 * (q_vec × (q_vec × v)) 这对应了 Quaternion::operator*(vec3f) 的优化实现。 2.4 转换为旋转矩阵单位四元数可以方便地转换为 3x3 或 4x4 的旋转矩阵。对应的 4x4 旋转矩阵 M 为： $$M &#x3D; \\begin{pmatrix}1 - 2(y^2 + z^2) &amp; 2(xy - zw) &amp; 2(xz + yw) &amp; 0 2(xy + zw) &amp; 1 - 2(x^2 + z^2) &amp; 2(yz - xw) &amp; 0 2(xz - yw) &amp; 2(yz + xw) &amp; 1 - 2(x^2 + y^2) &amp; 0 0 &amp; 0 &amp; 0 &amp; 1\\end{pmatrix}$$ 这对应了 Quaternion::toMatrix() 和 mat4::fromQuaternion() 的实现。 3. 桥接便利性：欧拉角与四元数的转换虽然四元数内部计算优势明显，但用户输入和调试时，欧拉角更直观。因此我们需要实现两者之间的转换。 3.1 转换约定欧拉角的转换结果依赖于旋转顺序和是内旋 (Intrinsic) 还是外旋 (Extrinsic)。我们选择一个常见的约定：ZYX 内旋，这通常对应于： 绕物体的局部 Z 轴旋转 Roll 角。绕新的局部 X 轴旋转 Pitch 角。绕更新后的局部 Y 轴旋转 Yaw 角。等效地，这也可以看作是外旋 YXZ：先绕世界 Y 轴 (Yaw)，再绕世界 X 轴 (Pitch)，最后绕世界 Z 轴 (Roll)。 3.2 欧拉角 -&gt; 四元数 (ZYX 内旋 &#x2F; YXZ 外旋)设欧拉角为 (pitch, yaw, roll)，分别对应绕 X, Y, Z 轴的旋转角度。对应的三个单轴旋转四元数分别为： q_roll = (cos(roll/2), 0, 0, sin(roll/2)) (绕 Z) q_pitch = (cos(pitch/2), sin(pitch/2), 0, 0) (绕 X) q_yaw = (cos(yaw/2), 0, sin(yaw/2), 0) (绕 Y) 最终的组合旋转（按 Roll -&gt; Pitch -&gt; Yaw 的顺序应用）对应的四元数为：q = q_yaw * q_pitch * q_roll 展开这个乘法（注意顺序），令 cy = cos(yaw/2), sy = sin(yaw/2), cp = cos(pitch/2), sp = sin(pitch/2), cr = cos(roll/2), sr = sin(roll/2)，可以推导出： w = cr*cp*cy + sr*sp*sy x = cr*sp*cy + sr*cp*sy (对应 Pitch) y = cr*cp*sy - sr*sp*cy (对应 Yaw) z = sr*cp*cy - cr*sp*sy (对应 Roll) 这正是 Quaternion::fromEulerAnglesZYX(vec3f(pitch, yaw, roll)) 的实现依据（注意函数参数的约定）。 3.3 四元数 -&gt; 欧拉角 (ZYX 内旋 &#x2F; YXZ 外旋)从单位四元数 q = (w, x, y, z) 推导出欧拉角 (pitch, yaw, roll)： Pitch (绕 X 轴): 可以从旋转矩阵的 m[2][1] (或 m[1][2]) 元素或者直接从四元数推导。 sin(pitch) = 2 * (w*x - y*z) 因此 pitch = asin(2 * (w*x - y*z)) 需要注意 asin 的值域是 [-pi&#x2F;2, pi&#x2F;2]，并将结果限制在此范围内。 Yaw (绕 Y 轴): tan(yaw) = (2*(w*y + x*z)) / (1 - 2*(x^2 + y^2)) (如果 cos(pitch) 不为 0) 使用 atan2 更稳健： yaw = atan2(2*(w*y + x*z), 1 - 2*(x^2 + y^2)) Roll (绕 Z 轴): tan(roll) = (2*(w*z + x*y)) / (1 - 2*(y^2 + z^2)) (如果 cos(pitch) 不为 0) 使用 atan2 更稳健： roll &#x3D; atan2(2*(wz + xy), 1 - 2*(y^2 + z^2)) 万向锁处理: 当 pitch 接近 +/- pi/2 时 (sin(pitch) 接近 +&#x2F;- 1)，cos(pitch) 接近 0，此时发生万向锁。Yaw 和 Roll 轴发生重合，无法唯一确定。这时 w*x - y*z 接近 +/- 0.5。在这种情况下，通常约定将 Roll 设为 0，然后计算 Yaw：yaw = atan2(2*(w*y + x*z), 1 - 2*(x^2 + y^2)) (当 pitch &#x3D; +pi&#x2F;2)(或者 yaw &#x3D; atan2(-2*(wy - xz), …) 根据具体情况调整)或者使用 yaw &#x3D; 2 * atan2(y, w) （当 pitch &#x3D; +pi&#x2F;2 且 roll &#x3D; 0） 我们的 Quaternion::toEulerAnglesZYX() 实现中包含了对 asin 输入的钳制和使用 atan2，是比较标准的转换方法。 4. 封装变换：Transform 类为了将位置、旋转（四元数）和缩放（向量）统一管理，我们创建了 Transform 类。 123456789101112131415161718192021222324252627282930313233343536// include/math/transform.h (部分)class Transform &#123;public: vec3f position; // 位置 Quaternion rotation; // 旋转 (内部使用四元数) vec3f scale; // 缩放 // 构造函数 (包括使用欧拉角的版本) Transform(); Transform(const vec3f&amp; pos, const Quaternion&amp; rot, const vec3f&amp; scl); Transform(const vec3f&amp; pos, const vec3f&amp; eulerAnglesDegreesZYX, const vec3f&amp; scl); // 设置/获取方法 (包括欧拉角版本) void setPosition(const vec3f&amp; pos); void setRotation(const Quaternion&amp; rot); void setScale(const vec3f&amp; scl); void setRotationEulerZYX(const vec3f&amp; eulerAnglesDegreesZYX); const vec3f&amp; getPosition() const; const Quaternion&amp; getRotation() const; const vec3f&amp; getScale() const; vec3f getRotationEulerZYX() const; // 获取欧拉角表示 // 应用变换的方法 void translate(const vec3f&amp; delta); void rotate(const Quaternion&amp; delta); // 组合旋转 void rotateEulerZYX(const vec3f&amp; deltaEulerDegreesZYX); // 应用欧拉角增量旋转 // 获取最终变换矩阵 mat4 getTransformMatrix() const; mat4 getNormalMatrix() const; // 用于法线变换 // 组合变换 (用于层级结构) Transform combine(const Transform&amp; parent) const; // ... 其他辅助方法如 lookAt ...&#125;; 核心方法：getTransformMatrix() 这个方法负责将存储的 position, rotation, scale 组合成一个最终的 4x4 变换矩阵，供渲染管线使用。标准的组合顺序是先缩放 (Scale)，然后旋转 (Rotate)，最后平移 (Translate)。对应的矩阵乘法顺序是 M = Matrix_Translate * Matrix_Rotate * Matrix_Scale。 123456// Transform::getTransformMatrix() 实现思路mat4 scaleMat = mat4::scale(scale.x, scale.y, scale.z);mat4 rotMat = rotation.toMatrix(); // 从四元数获取旋转矩阵mat4 transMat = mat4::translation(position.x, position.y, position.z);return transMat * rotMat * scaleMat; // T * R * S 法线变换矩阵：getNormalMatrix() 变换法线时，不能直接使用模型矩阵，尤其是存在非均匀缩放时。需要使用模型矩阵左上角 3x3 部分的逆转置矩阵。Transform 类也提供了计算这个矩阵的方法。 5. 使用示例12345678910111213141516171819202122232425262728293031#include &quot;math/transform.h&quot;#include &quot;math/vector.h&quot;#include &lt;iostream&gt;int main() &#123; // 使用欧拉角创建 Transform (假设 ZYX: Pitch=45, Yaw=30, Roll=0) Transform myTransform(&#123;0, 0, -5&#125;, &#123;45.0f, 30.0f, 0.0f&#125;, &#123;1, 1, 1&#125;); // 平移 myTransform.translate(&#123;1, 0, 0&#125;); // 旋转 (再绕世界 Y 轴旋转 15 度) Quaternion deltaRot = Quaternion::fromAxisAngle(&#123;0, 1, 0&#125;, 15.0f * Q_DEG2RAD); myTransform.rotate(deltaRot); // 组合四元数旋转 // 或者使用欧拉角增量旋转 // myTransform.rotateEulerZYX(&#123;0.0f, 15.0f, 0.0f&#125;); // 获取最终矩阵给渲染器 mat4 finalModelMatrix = myTransform.getTransformMatrix(); mat4 finalNormalMatrix = myTransform.getNormalMatrix(); // 获取当前姿态的欧拉角表示 (可能与输入不完全一致，尤其是多次旋转后) vec3f currentEuler = myTransform.getRotationEulerZYX(); std::cout &lt;&lt; &quot;Current Euler ZYX (P,Y,R): &quot; &lt;&lt; currentEuler.x &lt;&lt; &quot;, &quot; &lt;&lt; currentEuler.y &lt;&lt; &quot;, &quot; &lt;&lt; currentEuler.z &lt;&lt; std::endl; // renderer.drawModel(model, myTransform, material); // 传递 Transform 对象 return 0;&#125; 6. 总结通过引入 Transform 类并使用四元数作为内部旋转表示，我们实现了： 更好的封装: 将位置、旋转、缩放数据聚合管理。 避免万向锁: 内部旋转计算使用四元数，更加健壮。 用户便利性: 依然可以通过欧拉角接口来设置和获取旋转，方便用户理解和调试。 清晰的变换流程: getTransformMatrix() 明确了 S-&gt;R-&gt;T 的变换顺序。 这为我们构建更复杂的场景、动画和物理交互系统打下了坚实的基础。虽然引入了四元数和转换的数学，但其带来的稳定性和灵活性是值得的。","tags":["C++","图形学","渲染","软渲染器","变换","四元数","欧拉角"],"categories":["Computer Graphics","技术分享"]},{"title":"加速小型矩阵乘法：3x3与4x4优化","path":"/2025/04/15/MATMUL_ACCELERATION/","content":"在计算机图形学、物理模拟、机器人技术以及众多科学计算领域，3x3 和 4x4 矩阵的乘法运算极为常见且对性能至关重要。虽然现代 CPU 速度飞快，但在需要执行数百万次这类运算的场景下（例如实时渲染的每一帧），即使是微小的优化也能带来显著的性能提升。本报告将探讨几种加速这两种特定尺寸矩阵乘法的实用技术。 1. 基准：朴素算法 (Naive Algorithm)我们首先回顾标准的矩阵乘法定义。对于两个矩阵 $A$ ($m \\times n$) 和 $B$ ($n \\times p$)，它们的乘积 $C &#x3D; A \\times B$ 是一个 $m \\times p$ 的矩阵，其中每个元素 $C_{ij}$ 由下式给出： $$ C_{ij} &#x3D; \\sum_{k&#x3D;0}^{n-1} A_{ik} B_{kj} $$ 对于 $3 \\times 3$ 矩阵 ($m&#x3D;n&#x3D;p&#x3D;3$) 或 $4 \\times 4$ 矩阵 ($m&#x3D;n&#x3D;p&#x3D;4$)，这对应于一个三重嵌套循环： 123456789101112// 朴素 4x4 矩阵乘法示例 (C = A * B)mat4 multiply_naive(const mat4&amp; a, const mat4&amp; b) &#123; mat4 c = &#123;&#125;; // 初始化为零 for (int i = 0; i &lt; 4; ++i) &#123; // Row index for A and C for (int j = 0; j &lt; 4; ++j) &#123; // Column index for B and C for (int k = 0; k &lt; 4; ++k) &#123; // Inner dimension index c.m[i][j] += a.m[i][k] * b.m[k][j]; &#125; &#125; &#125; return c;&#125; 该算法的时间复杂度为 $O(n^3)$。对于固定的 $n&#x3D;3$ 或 $n&#x3D;4$，这本身并不算糟糕，但其循环结构引入了不可忽视的开销（计数器增量、条件判断、分支预测），并且可能不利于现代 CPU 的缓存和流水线执行。 2. 优化技术一：循环展开 (Loop Unrolling)循环展开是一种编译器优化技术，也可以手动实现，旨在通过减少或消除循环控制指令来降低开销。对于固定且较小的循环次数，我们可以完全展开循环。 2.1 完全展开 (3x3 矩阵)对于 $3 \\times 3$ 矩阵，乘法涉及计算 9 个结果元素。每个元素需要 3 次乘法和 2 次加法。总计 27 次乘法和 18 次加法。由于计算量固定且不大，我们可以完全展开所有循环，直接计算每个结果元素： 计算 $C_{00}$:$$ C_{00} &#x3D; A_{00}B_{00} + A_{01}B_{10} + A_{02}B_{20} $$计算 $C_{01}$:$$ C_{01} &#x3D; A_{00}B_{01} + A_{01}B_{11} + A_{02}B_{21} $$… 以此类推，直到 $C_{22}$。 代码原理: 12345678910111213141516171819mat3 multiply_3x3_unrolled(const mat3&amp; a, const mat3&amp; b) &#123; mat3 result; // C = A * B // 直接计算每个元素，无需循环 // Row 0 result.m[0][0] = a.m[0][0] * b.m[0][0] + a.m[0][1] * b.m[1][0] + a.m[0][2] * b.m[2][0]; result.m[0][1] = a.m[0][0] * b.m[0][1] + a.m[0][1] * b.m[1][1] + a.m[0][2] * b.m[2][1]; result.m[0][2] = a.m[0][0] * b.m[0][2] + a.m[0][1] * b.m[1][2] + a.m[0][2] * b.m[2][2]; // Row 1 (类似计算) result.m[1][0] = a.m[1][0] * b.m[0][0] + a.m[1][1] * b.m[1][0] + a.m[1][2] * b.m[2][0]; // ... result.m[1][1], result.m[1][2] ... // Row 2 (类似计算) result.m[2][0] = a.m[2][0] * b.m[0][0] + a.m[2][1] * b.m[1][0] + a.m[2][2] * b.m[2][0]; // ... result.m[2][1], result.m[2][2] ... return result;&#125; 这种方法消除了所有循环开销，指令流水线可能更流畅。缺点是代码体积增大，但对于 3x3 来说通常是值得的，且具有良好的可移植性。 2.2 部分展开 (4x4 矩阵)对于 $4 \\times 4$ 矩阵，完全展开（计算 16 个元素，每个需要 4 次乘法和 3 次加法，总计 64 次乘法和 48 次加法）虽然可行，但代码会非常冗长。更常见的做法是仅展开最内层的循环（k 循环），它对应于点积计算。 代码原理: 1234567891011121314mat4 multiply_4x4_inner_unrolled(const mat4&amp; a, const mat4&amp; b) &#123; mat4 result = &#123;&#125;; // Initialize to zero for (int i = 0; i &lt; 4; ++i) &#123; for (int j = 0; j &lt; 4; ++j) &#123; // 内层循环 k 被展开 result.m[i][j] = a.m[i][0] * b.m[0][j] + a.m[i][1] * b.m[1][j] + a.m[i][2] * b.m[2][j] + a.m[i][3] * b.m[3][j]; &#125; &#125; return result;&#125; 这保留了外两层循环，但消除了最频繁执行的内层循环的开销，是一个很好的性能与代码复杂度之间的折衷。 3. 优化技术二：SIMD 向量化 (Vectorization)现代 CPU 普遍支持 SIMD（Single Instruction, Multiple Data）指令集，如 SSE (Streaming SIMD Extensions) 和 AVX (Advanced Vector Extensions)。这些指令允许 CPU 在单个时钟周期内对多个数据元素（通常是 4 个 32 位浮点数或整数）执行相同的操作。 3.1 SIMD 与 4x4 矩阵$4 \\times 4$ 矩阵与 4 元素向量操作（如 SSE 的 128 位寄存器，可容纳 4 个 float）天然契合。我们可以将矩阵的行或列视为向量进行并行计算。 一种常见的 SIMD 策略是计算结果矩阵 $C$ 的一行 $C_i$：$$ C_i &#x3D; \\sum_{k&#x3D;0}^{3} A_{ik} B_k $$其中 $A_{ik}$ 是标量（矩阵 $A$ 的元素），$B_k$ 是矩阵 $B$ 的第 $k$ 行（视为向量）。在 SIMD 实现中，这通常转化为： 将 $B$ 的 4 行加载到 4 个 SIMD 寄存器中。 对于 $A$ 的第 $i$ 行，将其元素 $A_{i0}, A_{i1}, A_{i2}, A_{i3}$ 逐个“广播”（复制）到 SIMD 寄存器的所有通道。 使用广播后的 $A_{ik}$ 值与加载的 $B_k$ 行向量进行并行乘法。 将 4 次乘法的结果累加起来，得到结果行 $C_i$。 代码原理 (SSE 示例): 1234567891011121314151617181920212223242526272829303132333435#include &lt;immintrin.h&gt; // Or &lt;xmmintrin.h&gt;, &lt;emmintrin.h&gt; etc.mat4 multiply_4x4_sse(const mat4&amp; a, const mat4&amp; b) &#123; mat4 result; const float* pA = &amp;a.m[0][0]; const float* pB = &amp;b.m[0][0]; float* pResult = &amp;result.m[0][0]; for (int i = 0; i &lt; 4; ++i) &#123; // Calculate row i of result // Load rows of B into SSE registers (__m128 holds 4 floats) __m128 b_row0 = _mm_loadu_ps(&amp;pB[0 * 4]); // Unaligned load row 0 __m128 b_row1 = _mm_loadu_ps(&amp;pB[1 * 4]); // Unaligned load row 1 __m128 b_row2 = _mm_loadu_ps(&amp;pB[2 * 4]); // Unaligned load row 2 __m128 b_row3 = _mm_loadu_ps(&amp;pB[3 * 4]); // Unaligned load row 3 // Pointer to current row of A const float* a_row_ptr = &amp;pA[i * 4]; // Accumulator for the result row C[i] __m128 result_row; // C[i] = A[i][0] * B[0] result_row = _mm_mul_ps(_mm_set1_ps(a_row_ptr[0]), b_row0); // C[i] += A[i][1] * B[1] result_row = _mm_add_ps(result_row, _mm_mul_ps(_mm_set1_ps(a_row_ptr[1]), b_row1)); // C[i] += A[i][2] * B[2] result_row = _mm_add_ps(result_row, _mm_mul_ps(_mm_set1_ps(a_row_ptr[2]), b_row2)); // C[i] += A[i][3] * B[3] result_row = _mm_add_ps(result_row, _mm_mul_ps(_mm_set1_ps(a_row_ptr[3]), b_row3)); // Store the calculated row into the result matrix _mm_storeu_ps(&amp;pResult[i * 4], result_row); // Unaligned store &#125; return result;&#125; (_mm_set1_ps 用于广播标量，_mm_loadu_ps &#x2F; _mm_storeu_ps 用于非对齐内存访问，_mm_mul_ps 和 _mm_add_ps 执行并行的乘法和加法)。 SIMD 实现通常能提供最高的性能，但代价是代码可移植性差（依赖特定指令集）和复杂性增加。 3.2 SIMD 与 3x3 矩阵将 SIMD 应用于 $3 \\times 3$ 矩阵比较棘手，因为 SIMD 寄存器通常是 4 宽的。需要进行数据填充、掩码操作或复杂的重排（shuffling），这可能引入额外的开销，抵消并行计算的优势。因此，对于 3x3 矩阵，完全循环展开通常是更实用、更高效的选择。 4. 编译器优化与标志除了手动优化代码，编译器的优化能力也至关重要。 优化级别: 务必启用优化标志，如 GCC&#x2F;Clang 的 -O2 或 -O3，MSVC 的 &#x2F;O2。这些标志会启用包括自动循环展开、指令重排、自动向量化（有时）在内的多种优化。 目标架构: 使用 -march&#x3D;native (GCC&#x2F;Clang) 或 &#x2F;arch:AVX2 (MSVC) 等标志，可以让编译器生成针对特定 CPU 指令集的优化代码，充分利用可用的 SIMD 功能，即使是在看似普通的 C++ 代码（如循环展开版本）上，编译器也可能生成 SIMD 指令。 5. 为何不用 Strassen 等算法？Strassen 算法及其变种具有优于 $O(n^3)$ 的渐近复杂度（Strassen 约为 $O(n^{2.81})$）。然而，这些算法的常数因子和管理开销（递归、额外的加减法、内存分配）非常高。对于 $n&#x3D;3$ 或 $n&#x3D;4$ 这样的小尺寸，其开销远超其理论优势，实际性能通常不如经过优化的朴素算法。 6. 结论与建议加速 3x3 和 4x4 矩阵乘法没有银弹，需要根据目标平台、性能需求和可接受的复杂性来选择策略： 3x3 矩阵: 完全循环展开 通常是最佳选择，它提供了良好的性能提升，且代码相对直接，可移植性好。 4x4 矩阵:如果追求极致性能且目标平台确定（例如 PC 游戏开发），手动 SIMD (SSE&#x2F;AVX) 实现 是首选。 如果需要更好的可移植性或希望简化代码，内层循环展开 是一个可靠且有效的优化。无论哪种手动优化，配合编译器的 -O2&#x2F;-O3 和 -march&#x3D;native&#x2F;&#x2F;arch:AVX2 标志 都至关重要，以发挥硬件和编译器的全部潜力。 最后，任何性能优化都应基于实际测量 (Profiling)。在目标环境和典型负载下测试不同实现的性能，才能确定哪种方法真正“最快”。","tags":["C++","Optimization","SIMD","Matrix","Linear Algebra","Computer Graphics"],"categories":["Programming","Performance"]},{"title":"增强真实感：为软渲染器添加 AO、高光和光泽度贴图","path":"/2025/04/13/TEXTURE-MAPPINGS/","content":"在上一篇文章中，我们成功地为 C++ 软渲染器添加了法线贴图支持，让低模也能展现丰富的表面几何细节。然而，要进一步提升渲染的真实感，我们还需要引入更多控制光照和材质表现的细节。本文将介绍如何继续扩展我们的渲染管线，加入环境光遮蔽 (Ambient Occlusion - AO)、高光颜色 (Specular Color) 和 光泽度 (Glossiness) 贴图。 1. 回顾：基础光照与法线贴图目前，我们的 Blinn-Phong 着色器已经能够处理： 漫反射贴图 (Diffuse Map): 定义物体表面的基础颜色。 法线贴图 (Normal Map): 提供逐像素的法线信息，模拟几何细节。 统一的材质属性: 如环境光颜色 (ambientColor)、漫反射颜色 (diffuseColor)、高光颜色 (specularColor) 和光泽度指数 (shininess)，这些属性对整个物体生效。 虽然效果已经不错，但真实世界的材质表现远比这复杂。例如，金属和绝缘体的反光方式不同；物体缝隙中的环境光会更少；表面的粗糙度也会影响高光的形状。 2. 新成员：增强细节的纹理贴图为了更精细地控制渲染效果，我们引入以下三种新的纹理贴图： 2.1 环境光遮蔽 (Ambient Occlusion - AO) 贴图 作用: AO 贴图描述了模型表面某一点接收间接环境光的程度。它模拟了几何体自身或邻近几何体对环境光的遮挡效果。通常，缝隙、角落、褶皱等难以被环境光照射到的地方，其 AO 值较低（偏黑），而暴露在外的表面 AO 值较高（偏白）。 实现方式: AO 贴图通常是一张灰度图。在片元着色器中，我们采样 AO 图得到一个遮蔽因子 aoFactor (范围 0.0 到 1.0)。这个因子用于调制 (乘以) 最终的环境光贡献。 12// 伪代码AmbientTerm = GlobalAmbientLight * MaterialAmbientColor * aoFactor; 视觉效果: AO 贴图可以显著增强模型的体积感和细节，尤其是在缺少复杂全局光照计算的简单渲染管线中，能有效地模拟出接触阴影和几何体之间的遮挡感。 （示意图：左侧无 AO，右侧有 AO） 2.2 高光颜色 (Specular Color) 贴图 作用: 此贴图定义了表面高光反射的颜色和强度。基础的 Blinn-Phong 模型通常使用一个统一的 specularColor。但现实中，不同材质的高光颜色不同（例如，金属的高光通常带有金属本身的颜色，而绝缘体的高光通常是白色）。Specular 贴图允许我们逐像素地控制这一点。 实现方式: 在片元着色器中，如果 Specular 贴图存在，我们就采样它来获取当前片元的高光颜色 mapSpecularColor，并用它替代统一的 uniform_SpecularColor。如果贴图不存在，则回退使用统一颜色。 1234567// 伪代码if (useSpecularMap) &#123; matSpecular = sample(specularTexture, uv);&#125; else &#123; matSpecular = uniform_SpecularColor;&#125;// ... 使用 matSpecular 计算高光 ... 视觉效果: 可以表现混合材质，如金属上的锈迹（锈迹部分高光弱或无），或者带有特定颜色反射的材质。 2.3 光泽度 (Glossiness) 贴图 作用: 光泽度贴图（有时也叫光滑度 Smoothness 图，或者反过来用粗糙度 Roughness 图）控制表面高光的锐利程度。光滑的表面（如镜子、抛光金属）有小而亮的高光，而粗糙的表面（如磨砂塑料、石头）则有模糊而散开的高光。 实现方式: 在 Blinn-Phong 模型中，高光的锐利程度由 shininess 指数控制（值越高，高光越小越亮）。Gloss 贴图通常是灰度图，其值（范围 0.0 到 1.0）需要映射到一个合适的 shininess 范围。例如，可以将 Gloss 值 0.0 映射到最低 shininess（如 2），将 1.0 映射到最高 shininess（如 256 或更高）。 123456789// 伪代码if (useGlossMap) &#123; glossFactor = sample(glossTexture, uv).r; // 取单通道 // 线性映射示例 currentShininess = lerp(MIN_SHININESS, MAX_SHININESS, glossFactor);&#125; else &#123; currentShininess = uniform_Shininess;&#125;// ... specFactor = pow(NdotH, currentShininess) ... 注意：从 Gloss 值到 Shininess 的映射关系可以根据需要调整，线性、指数或自定义曲线都可以。 视觉效果: 极大地增强了材质的区分度，能清晰地表现出物体表面的光滑或粗糙程度。 3. 代码实现要点将这些贴图集成到我们现有的渲染器中，主要涉及以下修改： 3.1 数据结构 (Material, Shader) 在 Material 结构体中添加 aoTexture, specularTexture, glossTexture 成员（类型为 Texture）以及对应的加载函数。 在 Shader 基类中添加对应的 uniform_AoTexture, uniform_SpecularTexture, uniform_GlossTexture uniform 变量，以及 uniform_UseAoMap, uniform_UseSpecularMap, uniform_UseGlossMap 的布尔标志。 123456789101112131415161718192021// include/core/material.h (部分)struct Material &#123; // ... (之前的成员) ... Texture aoTexture; Texture specularTexture; Texture glossTexture; // ... (加载函数) ...&#125;;// include/core/shader.h (部分)class Shader &#123;public: // ... (之前的 Uniforms) ... Texture uniform_AoTexture; bool uniform_UseAoMap = false; Texture uniform_SpecularTexture; bool uniform_UseSpecularMap = false; Texture uniform_GlossTexture; bool uniform_UseGlossMap = false; // ...&#125;; 3.2 渲染器 (Renderer)在 Renderer::drawModel 函数中设置 Shader Uniform 的部分，添加对新贴图和标志的设置： 123456789101112131415161718// src/core/renderer.cpp (drawModel 部分)void Renderer::drawModel(Model&amp; model, const mat4&amp; modelMatrix, const Material&amp; material) &#123; // ... (检查 shader) ... auto&amp; shader = *material.shader; // ... (设置矩阵、光照、基础材质 Uniforms) ... // 设置新贴图 Uniforms 和 Flags shader.uniform_AoTexture = material.aoTexture; shader.uniform_UseAoMap = !material.aoTexture.empty(); shader.uniform_SpecularTexture = material.specularTexture; shader.uniform_UseSpecularMap = !material.specularTexture.empty(); shader.uniform_GlossTexture = material.glossTexture; shader.uniform_UseGlossMap = !material.glossTexture.empty(); // ... (顶点处理与渲染循环) ...&#125; 3.3 片元着色器 (Fragment Shader)这是改动最大的地方，在 BlinnPhongShader::fragment 中集成新贴图的采样和应用逻辑： 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263// src/core/blinn_phong_shader.cpp (fragment 部分)bool BlinnPhongShader::fragment(const Varyings&amp; input, vec3f&amp; outColor) &#123; // --- 获取法线 N (处理法线贴图) --- // ... (同上一篇文章) ... // --- 获取材质属性 (考虑贴图) --- // Diffuse Color vec3f matDiffuse = uniform_DiffuseColor; if (uniform_UseDiffuseMap &amp;&amp; !uniform_DiffuseTexture.empty()) &#123; /* modulate */ &#125; // Specular Color vec3f matSpecular = uniform_SpecularColor; // Default if (uniform_UseSpecularMap &amp;&amp; !uniform_SpecularTexture.empty()) &#123; matSpecular = uniform_SpecularTexture.sample(input.uv.x, input.uv.y); // Override &#125; // Shininess (via Gloss Map) int currentShininess = uniform_Shininess; // Default if (uniform_UseGlossMap &amp;&amp; !uniform_GlossTexture.empty()) &#123; float glossFactor = uniform_GlossTexture.sample(input.uv.x, input.uv.y).x; // Sample gloss (e.g., R channel) glossFactor = std::max(0.0f, std::min(1.0f, glossFactor)); const int minShininess = 2; const int maxShininess = 256; // Adjust range as needed currentShininess = minShininess + static_cast&lt;int&gt;(static_cast&lt;float&gt;(maxShininess - minShininess) * glossFactor); currentShininess = std::max(minShininess, currentShininess); &#125; // --- AO Factor --- float aoFactor = 1.0f; // Default: full ambient if (uniform_UseAoMap &amp;&amp; !uniform_AoTexture.empty()) &#123; aoFactor = uniform_AoTexture.sample(input.uv.x, input.uv.y).x; // Sample AO (e.g., R channel) aoFactor = std::max(0.0f, std::min(1.0f, aoFactor)); &#125; // --- 光照计算 --- vec3f V = (uniform_CameraPosition - input.worldPosition).normalized(); vec3f matAmbient = uniform_AmbientColor; // Ambient Term (modulated by AO) vec3f ambientTerm = uniform_AmbientLight * matAmbient * aoFactor; vec3f totalColor = ambientTerm; // 循环处理光源 for (const auto&amp; light : uniform_Lights) &#123; // ... (计算 L, lightCol, attenuation) ... // Diffuse Term float NdotL = std::max(0.0f, N.dot(L)); vec3f diffuse = matDiffuse * lightCol * NdotL * attenuation; // Specular Term (using derived matSpecular and currentShininess) vec3f H = (L + V).normalized(); float NdotH = std::max(0.0f, N.dot(H)); float specFactor = fastPow(NdotH, currentShininess); // Use mapped shininess vec3f specular = matSpecular * lightCol * specFactor * attenuation; // Use mapped specular color totalColor = totalColor + diffuse + specular; &#125; // --- Final Color --- // ... (Clamp outColor) ... return true;&#125; 4. 效果展示当这几种贴图组合在一起时，渲染结果的真实感将得到显著提升。金属部分会呈现出带有颜色的高光，锈迹部分则显得暗淡粗糙；模型缝隙的阴影感更强，整体光照更加自然。 （示意图：对比仅有 Diffuse&#x2F;Normal 与 包含 AO&#x2F;Specular&#x2F;Gloss 的渲染效果） 5. 总结与展望通过引入 AO、Specular 和 Gloss 贴图，我们的软渲染器在表现材质细节方面迈进了一大步。这使得我们能够更精细地控制光照的各个方面，模拟出更加多样和逼真的表面效果。 这些贴图的概念实际上也是基于物理的渲染 (Physically Based Rendering - PBR) 工作流的核心组成部分（尽管 PBR 通常使用不同的参数组合，如 Albedo、Metallic、Roughness、AO）。虽然我们当前的 Blinn-Phong 光照模型并非严格意义上的 PBR，但对这些贴图的支持为将来向更先进的 PBR 光照模型迁移打下了良好的基础。 下一步，可以考虑实现更复杂的 PBR 光照模型（如 Cook-Torrance），或者引入环境贴图 (Environment Mapping) 来实现基于图像的光照 (Image-Based Lighting - IBL)，让渲染效果更上一层楼。","tags":["C++","图形学","渲染","软渲染器","PBR 基础","AO","贴图"],"categories":["Computer Graphics","技术分享"]},{"title":"在软渲染器中实现法线贴图 (Normal Mapping)","path":"/2025/04/12/NORMAL-MAPPING/","content":"在实时计算机图形学中，模型的细节往往受到多边形数量的限制。为了在不显著增加模型复杂度的前提下，模拟出丰富的表面细节（如凹凸、划痕、纹理），法线贴图技术应运而生。本文将详细介绍如何在基于 C++ 的软件渲染器中实现切线空间法线贴图 (Tangent Space Normal Mapping)。 1. 问题的提出：低模的局限性传统的低多边形模型 (Low-Poly Model) 依赖于顶点法线 (Vertex Normals) 进行光照计算。通过 Gouraud Shading 或 Phong Shading，我们可以在顶点之间插值法线，获得平滑的光照过渡效果。然而，这种方法无法表现模型表面的微小几何细节。如果想要模型拥有丰富的凹凸细节，就需要极高数量的多边形，这对于实时渲染来说通常是不可接受的。 （示意图：左侧为低模+顶点法线光照，右侧为低模+法线贴图光照） 2. 解决方案：法线贴图法线贴图的核心思想是：用一张纹理来存储模型表面各点的法线信息。这张特殊的纹理被称为“法线贴图”。在渲染时，我们不再直接使用插值得到的顶点法线，而是从法线贴图中采样对应片元 (Fragment) 的法线向量，并用这个采样得到的法线来进行光照计算。 由于纹理可以存储非常丰富的信息，即使模型本身多边形数量很少，通过法线贴图也能模拟出极其逼真的表面细节。 3. 关键概念：切线空间 (Tangent Space)直接将世界空间 (World Space) 或模型空间 (Model Space) 的法线存储在纹理中是可行的，但这会导致法线贴图与模型的特定姿态或变换绑定，难以复用。更常用的方法是使用 切线空间 (Tangent Space)。 切线空间是一个局部坐标系，定义在模型的每个表面点上。它由三个相互正交（或近似正交）的基向量构成： 法线 (Normal - N): 即该点的原始顶点法线，通常垂直于表面。 切线 (Tangent - T): 平行于表面，通常沿着纹理坐标 U 的增加方向。 副切线 (Bitangent - B): 平行于表面，通常沿着纹理坐标 V 的增加方向，并且可以通过 N 和 T 的叉乘得到 (B = cross(N, T)) 来保证正交性。 （示意图：模型表面一点的切线空间 TBN 基向量） 法线贴图中存储的是相对于这个局部 TBN 坐标系的法线扰动。通常，RGB 通道对应 TBN 向量： R -&gt; Tangent 方向分量 G -&gt; Bitangent 方向分量 B -&gt; Normal 方向分量 一个“平坦”表面的法线在切线空间中通常是 (0, 0, 1)。由于颜色通道通常存储在 [0, 1] 范围内，而法线分量在 [-1, 1] 范围内，因此需要进行映射。常用的映射方式为： 存储值 &#x3D; (法线分量 + 1.0) &#x2F; 2.0 或者反过来，从纹理采样值恢复法线分量： 法线分量 &#x3D; 采样值 * 2.0 - 1.0 因此，法线贴图中常见的“基准”蓝色 (0.5, 0.5, 1.0) 就代表了切线空间中的 (0, 0, 1) 法线，即未发生扰动的原始表面法线方向。 使用切线空间的好处： 解耦: 法线信息与模型的具体旋转、变形无关。 复用: 同一张法线贴图可以应用在不同模型或模型的不同部分（只要它们的 UV 布局允许）。 压缩友好: 大部分法线的 Z 分量（Normal 方向）都接近 1，可以通过优化存储。 4. 实现步骤要在我们的软渲染器中实现切线空间法线贴图，需要修改渲染管线的多个阶段。 4.1 计算顶点切线和副切线我们需要为模型的每个顶点计算其 TBN 基础向量。这通常在模型加载后、渲染前完成。计算方法基于构成三角形的顶点位置和纹理坐标： 对于三角形 P0, P1, P2 及其对应的纹理坐标 UV0, UV1, UV2： 计算边向量： Edge1 = P1 - P0 Edge2 = P2 - P0 计算 UV 差量： DeltaUV1 = UV1 - UV0 DeltaUV2 = UV2 - UV0 计算系数 f： f = 1.0 / (DeltaUV1.x * DeltaUV2.y - DeltaUV2.x * DeltaUV1.y) 计算切线 T 和副切线 B： Tangent.x = f * (DeltaUV2.y * Edge1.x - DeltaUV1.y * Edge2.x) Tangent.y = f * (DeltaUV2.y * Edge1.y - DeltaUV1.y * Edge2.y) Tangent.z = f * (DeltaUV2.y * Edge1.z - DeltaUV1.y * Edge2.z) Bitangent.x = f * (-DeltaUV2.x * Edge1.x + DeltaUV1.x * Edge2.x) Bitangent.y = f * (-DeltaUV2.x * Edge1.y + deltaUV1.x * Edge2.y) Bitangent.z = f * (-DeltaUV2.x * Edge1.z + deltaUV1.x * Edge2.z) 计算出的 T 和 B 需要累加到每个顶点上（因为一个顶点可能被多个三角形共享）。 最后，对每个顶点的 T 和 B 进行正交化和归一化处理，常用 Gram-Schmidt 方法： T = normalize(T - N * dot(N, T)) &#x2F;&#x2F; 使 T 正交于 N 检查 dot(cross(N, T), B) 的符号，判断 TBN 坐标系的左右手性是否与 UV 坐标系一致，必要时翻转 T。 B = normalize(cross(N, T)) &#x2F;&#x2F; 重新计算 B 以确保正交 代码片段 (Model::calculateTangents): 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152// src/core/model.cppvoid Model::calculateTangents() &#123; tangents.assign(numVertices(), vec3f(0.0f, 0.0f, 0.0f)); bitangents.assign(numVertices(), vec3f(0.0f, 0.0f, 0.0f)); // --- Loop through faces to calculate T and B contributions --- for (size_t i = 0; i &lt; numFaces(); ++i) &#123; // ... (Get vertices v0, v1, v2 and uvs uv0, uv1, uv2) ... vec3f edge1 = v1 - v0; vec3f edge2 = v2 - v0; vec2f deltaUV1 = uv1 - uv0; vec2f deltaUV2 = uv2 - uv0; float f = 1.0f / (deltaUV1.x * deltaUV2.y - deltaUV2.x * deltaUV1.y); if (std::isinf(f) || std::isnan(f)) &#123; f = 0.0f; &#125; // Avoid NaN/Inf vec3f tangent = (edge1 * deltaUV2.y - edge2 * deltaUV1.y) * f; vec3f bitangent = (edge2 * deltaUV1.x - edge1 * deltaUV2.x) * f; // Accumulate for vertices for (int j = 0; j &lt; 3; ++j) &#123; tangents[face.vertIndex[j]] = tangents[face.vertIndex[j]] + tangent; bitangents[face.vertIndex[j]] = bitangents[face.vertIndex[j]] + bitangent; &#125; &#125; // --- Loop through vertices to orthogonalize and normalize --- for (size_t i = 0; i &lt; numVertices(); ++i) &#123; const vec3f&amp; n = getNormal(i); // Assuming normal indices match vertex indices after processing vec3f&amp; t = tangents[i]; vec3f&amp; b = bitangents[i]; if (t.length() &gt; 1e-6f &amp;&amp; n.length() &gt; 1e-6f) &#123; // Gram-Schmidt orthogonalize T against N t = (t - n * n.dot(t)).normalized(); // Check handedness and recalculate B if (n.cross(t).dot(b) &lt; 0.0f) &#123; t = t * -1.0f; // Flip tangent if needed &#125; b = n.cross(t).normalized(); // Ensure B is orthogonal and normalized &#125; else &#123; // Handle degenerate cases: Create arbitrary orthogonal basis vec3f up = (std::abs(n.y) &lt; 0.99f) ? vec3f(0.0f, 1.0f, 0.0f) : vec3f(1.0f, 0.0f, 0.0f); t = n.cross(up).normalized(); b = n.cross(t).normalized(); &#125; // Fallback for NaN/Inf safety if (std::isnan(t.x) || std::isinf(t.x)) t = vec3f(1,0,0); if (std::isnan(b.x) || std::isinf(b.x)) b = vec3f(0,0,1); &#125;&#125; 将计算得到的 tangents 和 bitangents 存储在 Model 类中。 4.2 数据准备与传递 Material: 在 Material 结构体中添加 normalTexture 成员及加载方法。 Shader Uniforms: 在 Shader 基类中添加 uniform_NormalTexture (类型 Texture) 和 uniform_UseNormalMap (类型 bool)。 Vertex Input: 修改 VertexInput 结构体，添加 tangent 和 bitangent 成员。 12345678// include/core/shader.hstruct VertexInput &#123; vec3f position; vec3f normal; vec2f uv; vec3f tangent; // Added vec3f bitangent; // Added&#125;; Varyings: 修改 Varyings 结构体，传递世界空间下的 TBN 基向量。 12345678910// include/core/shader.hstruct Varyings &#123; vec4f clipPosition; vec3f worldPosition; vec2f uv; // World-space TBN basis vectors vec3f tangent; // World Tangent vec3f bitangent; // World Bitangent vec3f normal; // World (Geometric) Normal&#125;; Renderer: 在 Renderer::drawModel 中，设置 uniform_NormalTexture 和 uniform_UseNormalMap。在构建 VertexInput 时，从 Model 获取 tangent 和 bitangent。 4.3 顶点着色器 (Vertex Shader)顶点着色器的主要任务是将 TBN 基向量从模型空间转换到世界空间，并传递给片元着色器。 代码片段 (BlinnPhongShader::vertex): 123456789101112131415161718192021222324252627// src/core/blinn_phong_shader.cppVaryings BlinnPhongShader::vertex(const VertexInput&amp; input) &#123; Varyings output; vec4f modelPos4(input.position, 1.0f); vec4f modelNormal4(input.normal, 0.0f); vec4f modelTangent4(input.tangent, 0.0f); vec4f modelBitangent4(input.bitangent, 0.0f); // Calculate world position output.worldPosition = (uniform_ModelMatrix * modelPos4).xyz(); // Transform TBN vectors to world space using Normal Matrix // uniform_NormalMatrix is typically transpose(inverse(ModelMatrix)) // Ensure they are normalized after transformation. output.normal = (uniform_NormalMatrix * modelNormal4).xyz().normalized(); output.tangent = (uniform_NormalMatrix * modelTangent4).xyz().normalized(); output.bitangent = (uniform_NormalMatrix * modelBitangent4).xyz().normalized(); // Optional: Recalculate bitangent worldB = cross(worldN, worldT) here for robustness. // Pass UVs output.uv = input.uv; // Calculate clip space position output.clipPosition = uniform_MVP * modelPos4; return output;&#125; 4.4 片元着色器 (Fragment Shader)片元着色器是实现法线贴图的核心： 检查是否使用法线贴图: 根据 法线的 texture 是否为空为标志。 采样法线贴图: 如果使用，则根据插值得到的 uv 坐标采样 uniform_NormalTexture。 解压法线: 将采样到的 [0, 1] 颜色值转换回 [-1, 1] 的切线空间法线向量 N_{tangent}。 N_{tangent} &#x3D; normalize(Sample_{RGB} * 2.0 - 1.0) 构建 TBN 矩阵: 使用从顶点着色器插值得到的世界空间 TBN 基向量（需要重新归一化）。 T &#x3D; normalize(input.tangent) B &#x3D; normalize(input.bitangent) N_{geom} &#x3D; normalize(input.normal) 转换法线: 将切线空间法线 N_{tangent} 转换到世界空间。 N_{world} &#x3D; normalize(T * N_{tangent}.x + B * N_{tangent}.y + N_{geom} * N_{tangent}.z) 光照计算: 使用计算得到的 N_{world} (如果使用了法线贴图) 或 N_{geom} (如果未使用) 进行后续的 Blinn-Phong 或其他光照模型计算。 代码片段 (BlinnPhongShader::fragment): 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960// src/core/blinn_phong_shader.cppbool BlinnPhongShader::fragment(const Varyings&amp; input, vec3f&amp; outColor) &#123; vec3f N; // The final normal used for lighting if (uniform_UseNormalMap &amp;&amp; !uniform_NormalTexture.empty()) &#123; // 1. Sample the normal map vec3f tangentNormalSample = uniform_NormalTexture.sample(input.uv.x, input.uv.y); // 2. Unpack from [0,1] to [-1,1] and normalize vec3f tangentNormal = (tangentNormalSample * 2.0f) - vec3f(1.0f, 1.0f, 1.0f); tangentNormal = tangentNormal.normalized(); // Ensure unit length // 3. Get interpolated world-space TBN basis (renormalize) vec3f T = input.tangent.normalized(); vec3f B = input.bitangent.normalized(); vec3f N_geom = input.normal.normalized(); // 4. Transform tangent-space normal to world space // N_world = T*Nx_tan + B*Ny_tan + N_geom*Nz_tan N = T * tangentNormal.x + B * tangentNormal.y + N_geom * tangentNormal.z; N = N.normalized(); // Final normal for lighting &#125; else &#123; // Use interpolated geometric normal if no normal map N = input.normal.normalized(); &#125; // --- Proceed with Blinn-Phong lighting using the final Normal N --- vec3f V = (uniform_CameraPosition - input.worldPosition).normalized(); // View direction vec3f totalColor = uniform_AmbientLight * uniform_AmbientColor; // Start with ambient // Get material properties (potentially textured) vec3f matDiffuse = uniform_DiffuseColor; if (!uniform_DiffuseTexture.empty()) &#123; matDiffuse = matDiffuse * uniform_DiffuseTexture.sample(input.uv.x, input.uv.y); &#125; // ... (Get matSpecular, matShininess) ... for (const auto&amp; light : uniform_Lights) &#123; // ... (Calculate L, lightCol, attenuation) ... // Diffuse float NdotL = std::max(0.0f, N.dot(L)); vec3f diffuse = matDiffuse * lightCol * NdotL * attenuation; // Specular (Blinn-Phong) vec3f H = (L + V).normalized(); float NdotH = std::max(0.0f, N.dot(H)); float specFactor = fastPow(NdotH, uniform_Shininess); // Use the fastPow utility vec3f specular = uniform_SpecularColor * lightCol * specFactor * attenuation; totalColor = totalColor + diffuse + specular; &#125; // Clamp final color outColor.x = std::min(1.0f, std::max(0.0f, totalColor.x)); outColor.y = std::min(1.0f, std::max(0.0f, totalColor.y)); outColor.z = std::min(1.0f, std::max(0.0f, totalColor.z)); return true; // Pixel should be written&#125; 4.5 插值 (Interpolation)确保 Renderer::interpolateVaryings 函数能够正确地对新增的 tangent, bitangent, normal 向量进行透视矫正插值。由于 interpolateVaryings 内部使用了模板化的 perspectiveCorrectInterpolate，只需要在 interpolateVaryings 中添加对这三个新成员的调用即可。 123456789// src/core/renderer.cppVaryings Renderer::interpolateVaryings(float t, const Varyings&amp; start, const Varyings&amp; end, float startInvW, float endInvW) const &#123; Varyings result; // ... (Interpolate worldPosition, uv) ... result.normal = perspectiveCorrectInterpolate(t, start.normal, end.normal, startInvW, endInvW); result.tangent = perspectiveCorrectInterpolate(t, start.tangent, end.tangent, startInvW, endInvW); result.bitangent = perspectiveCorrectInterpolate(t, start.bitangent, end.bitangent, startInvW, endInvW); return result;&#125; 5. 总结与效果通过以上步骤，我们就成功地在软渲染器中集成了切线空间法线贴图。渲染低多边形模型时，通过在片元着色器中查询法线贴图并使用得到的法线进行光照计算，可以在几乎不增加几何复杂度的前提下，极大地提升模型的表面细节和真实感。 这项技术是现代实时渲染中不可或缺的一部分，能够以较低的性能开销实现高质量的视觉效果。 6. 注意事项 Tangent Calculation: 上述切线计算方法比较基础，对于复杂的 UV 布局或重叠 UV 可能产生问题。更精确的方法（如 MikkTSpace）更为健壮。 Normal Map Format: 注意法线贴图的 Y 分量（通常是绿色通道）在不同规范（如 OpenGL 和 DirectX）中可能方向相反。需要确保加载和解压时使用正确的约定。 TBN 正交性: 插值后的 TBN 基向量可能不再严格正交，在片元着色器中重新正交化（如通过 B &#x3D; cross(N, T)）可以提高精度，但会增加计算量。 sRGB: 如果法线贴图被错误地当作 sRGB 纹理处理，会导致解压出的法线不准确。应确保法线贴图作为线性数据处理。","tags":["C++","图形学","渲染","法线贴图","软渲染器"],"categories":["Computer Graphics","技术分享"]},{"title":"Blinn-Phong 着色器实现","path":"/2025/04/06/BLINN-PHONG-SHADER/","content":"Blinn-Phong 着色器实现概述Blinn-Phong 着色模型是经典 Phong 模型的改进版本，通过引入半角向量(Halfway Vector)优化了高光计算。我们的实现包含完整的顶点和片段着色器处理流程。 核心实现1. 顶点着色器1234567891011121314151617Varyings BlinnPhongShader::vertex(const VertexInput&amp; input) &#123; Varyings output; // 计算世界空间位置 vec4f worldPos4 = uniform_ModelMatrix * vec4f(input.position, 1.0f); output.worldPosition = Vector3&lt;float&gt;(worldPos4.x, worldPos4.y, worldPos4.z); // 变换法线到世界空间 vec4f worldNormal4 = uniform_NormalMatrix * vec4f(input.normal, 0.0f); output.worldNormal = worldNormal4.xyz().normalized(); // 传递UV坐标 output.uv = input.uv; // 计算裁剪空间位置 output.clipPosition = uniform_MVP * vec4f(input.position, 1.0f); return output;&#125; 2. 片段着色器片段着色器实现了完整的 Blinn-Phong 光照模型： 12345678910111213141516171819202122232425262728293031323334bool BlinnPhongShader::fragment(const Varyings&amp; input, Vector3&lt;float&gt;&amp; outColor) &#123; // 标准化法线和视线方向 Vector3&lt;float&gt; N = input.worldNormal.normalized(); Vector3&lt;float&gt; V = (uniform_CameraPosition - input.worldPosition).normalized(); // 材质属性 Vector3&lt;float&gt; matDiffuse = uniform_Material.diffuseColor; if (uniform_Material.hasDiffuseTexture()) &#123; matDiffuse = matDiffuse * uniform_Material.diffuseTexture.sample(input.uv.x, input.uv.y); &#125; // ...其他材质属性处理 // 光照计算 Vector3&lt;float&gt; totalColor = uniform_AmbientLight * uniform_Material.ambientColor; for (const auto&amp; light : uniform_Lights) &#123; // 计算光线方向 Vector3&lt;float&gt; L = light.getDirectionTo(input.worldPosition); // 漫反射计算 float diffFactor = std::max(0.0f, N.dot(L)); Vector3&lt;float&gt; diffuse = matDiffuse * light.color * diffFactor; // Blinn-Phong 高光计算 Vector3&lt;float&gt; H = (L + V).normalized(); float specFactor = fastPow(std::max(0.0f, N.dot(H)), matShininess); Vector3&lt;float&gt; specular = matSpecular * light.color * specFactor; totalColor += diffuse + specular; &#125; // 颜色钳制 outColor = totalColor.clamp(0.0f, 1.0f); return true;&#125; 关键技术点1. 快速幂计算1234567891011template &lt;typename T&gt;T fastPow(T base, int n) &#123; // 使用快速幂算法优化高光计算 T res = static_cast&lt;T&gt;(1); while (n) &#123; if (n &amp; 1) res = res * base; base = base * base; n &gt;&gt;= 1; &#125; return res;&#125; 2. 光照类型支持 方向光(Directional Light) 点光源(Point Light) 环境光(Ambient Light) 3. 材质系统 漫反射颜色&#x2F;贴图 高光颜色 光泽度(Shininess) 环境光反射率 使用方法 创建 BlinnPhongShader 实例 设置必要的 uniform 变量: 模型、视图、投影矩阵 材质属性 光源参数 绑定到渲染器使用 性能优化 使用快速幂算法优化高光计算 提前终止无效的光照计算 向量运算的规范化处理 后续改进计划 添加法线贴图支持 实现 PBR 材质系统 支持多光源阴影计算","tags":["计算机图形学","C++","渲染引擎","着色器"],"categories":["Computer Graphics","技术分享"]},{"title":"渲染管线简介","path":"/2025/04/06/RENDERING-PIPELINE/","content":"SoftRasterizer 渲染流程解析概述本文档详细分析 SoftRasterizer 的渲染管线实现，涵盖从模型加载到最终像素输出的完整流程。渲染管线主要分为初始化阶段和每帧渲染阶段。 核心渲染流程1. 初始化阶段1234567891011121314// main.cpp 中的初始化代码Framebuffer framebuffer(width, height);Model model;model.loadFromObj(&quot;resources/obj/african_head.obj&quot;);model.loadDiffuseTexture(&quot;resources/diffuse/african_head_diffuse.tga&quot;);Camera camera(Vector3&lt;float&gt;(0, 1, 3), Vector3&lt;float&gt;(0, 0, 0), Vector3&lt;float&gt;(0, 1, 0));camera.setPerspective(45.0f, aspectRatio, near, far);Renderer renderer(framebuffer, camera);std::vector&lt;Light&gt; lights;// 设置光源...auto shader = std::make_shared&lt;BlinnPhongShader&gt;();renderer.setShader(shader); 2. 每帧渲染阶段2.1 设置Uniform变量123456789// renderer.cpp - drawModel()currentShader-&gt;uniform_ModelMatrix = modelMatrix;currentShader-&gt;uniform_ViewMatrix = viewMatrix; currentShader-&gt;uniform_ProjectionMatrix = projectionMatrix;currentShader-&gt;uniform_MVP = projectionMatrix * viewMatrix * modelMatrix;currentShader-&gt;uniform_NormalMatrix = modelMatrix.inverse().transpose();currentShader-&gt;uniform_CameraPosition = cameraPosition;currentShader-&gt;uniform_Lights = lights;currentShader-&gt;uniform_Material = material; 2.2 顶点处理12345678// shader.hvirtual Varyings vertex(const VertexInput&amp; input) = 0;// 顶点着色器处理流程：// 1. 将顶点位置变换到世界空间// 2. 变换法线到世界空间// 3. 计算裁剪空间位置// 4. 传递UV等属性 2.3 三角形组装与光栅化123456789101112// renderer.cpp// 1. 背面剔除float signedArea = (p1.x - p0.x) * (p2.y - p0.y) - (p2.x - p0.x) * (p1.y - p0.y);if (signedArea &lt; 0) continue;// 2. 扫描线光栅化for (int y = yStart; y &lt;= yEnd; ++y) &#123; // 沿三角形两边插值 // 在扫描线内插值片段属性 // 深度测试 if (depth &gt;= framebuffer.getDepth(x, y)) continue;&#125; 2.4 片段处理12345678// shader.h virtual bool fragment(const Varyings&amp; input, Vector3&lt;float&gt;&amp; outColor) = 0;// 片段着色器处理流程：// 1. 标准化法线和视线方向// 2. 采样纹理(如果有)// 3. 计算光照(漫反射+高光)// 4. 输出最终颜色 2.5 帧缓冲更新1framebuffer.setPixel(x, y, fragmentColor, depth); 关键技术点1. 透视校正插值1234567Varyings Renderer::interpolateVaryings(float t, const Varyings&amp; start, const Varyings&amp; end, float startInvW, float endInvW) &#123; // 使用1/w进行透视校正插值 float currentInvW = startInvW + (endInvW - startInvW) * t; float currentW = 1.0f / currentInvW; // 对每个属性进行插值&#125; 2. 深度缓冲12345// 深度值映射到[0,1]范围screenVertices[j].z = (ndcPos.z + 1.0f) * 0.5f; // 深度测试if (depth &gt;= framebuffer.getDepth(x, y)) continue; 3. 光照计算优化1234567891011// 使用快速幂算法优化高光计算template &lt;typename T&gt;T fastPow(T base, int n) &#123; T res = static_cast&lt;T&gt;(1); while (n) &#123; if (n &amp; 1) res = res * base; base = base * base; n &gt;&gt;= 1; &#125; return res;&#125; 渲染管线图示12345678910graph TD A[模型加载] --&gt; B[设置相机和光源] B --&gt; C[设置着色器和材质] C --&gt; D[顶点处理] D --&gt; E[三角形组装] E --&gt; F[光栅化] F --&gt; G[片段处理] G --&gt; H[深度测试] H --&gt; I[帧缓冲更新] I --&gt; J[输出图像] 性能优化 提前深度测试：在片段着色器前进行深度测试 背面剔除：减少约50%的三角形处理 快速幂算法：优化高光计算 透视校正插值：保证纹理和属性正确插值 后续改进计划 实现法线贴图支持 添加阴影计算 支持延迟渲染管线 实现多线程渲染","tags":["计算机图形学","C++","渲染引擎","渲染管线"],"categories":["Computer Graphics","技术分享"]},{"title":"透视投影","path":"/2025/04/05/PERSPECTIVE-PROJECTION/","content":"实现软光栅化中的透视投影：从基础渲染到深度测试优化在开发软光栅化渲染器时，透视投影是实现真实感渲染的关键一步。本文基于一次代码修改（git diff），详细讲述如何将一个基础的模型渲染系统升级为支持透视投影的渲染管线，包括矩阵变换、深度处理和透视校正的实现过程。 背景最初的渲染代码（src/main.cpp）使用简单的屏幕空间投影，直接将模型的顶点映射到帧缓冲区，没有考虑透视效果和深度缓冲的正确性： 1model.renderSolid(framebuffer, vec3f(1.0f, 1.0f, 1.0f), vec3f(0.0f, 0.0f, 1.0f)); 目标是引入透视投影，使远处的物体变小，并通过深度测试实现正确的遮挡关系。以下是实现过程的步骤。 步骤 1：引入变换矩阵在 src/main.cpp 中，我们添加了模型、视图和投影矩阵，用于将顶点从模型空间变换到裁剪空间： 12345678910111213float near = 0.1f;float far = 100.0f;mat4 modelMatrix = mat4::identity();mat4 viewMatrix = mat4::translation(0, 0, -3); // 相机后移mat4 projectionMatrix = mat4::perspective( 45.0f * 3.1415926f / 180.0f, // FOV (float)width/height, // 宽高比 near, // 近裁剪面 far // 远裁剪面);mat4 mvp = projectionMatrix * viewMatrix * modelMatrix; 模型矩阵：保持不变（identity），后续可添加旋转或缩放。 视图矩阵：将相机向后移动 3 个单位，模拟观察者的位置。 投影矩阵：使用透视投影，定义视锥体（FOV 为 45°）。 MVP 矩阵：组合三者，用于顶点变换。 渲染调用改为： 1model.renderSolid(framebuffer, near, far, mvp, vec3f(1.0f, 1.0f, 1.0f), vec3f(0.0f, 0.0f, 1.0f)); 步骤 2：顶点变换与透视除法在 src/core/model.cpp 中，renderSolid 方法从简单的屏幕映射升级为完整的透视投影管线： 2.1 顶点变换将顶点从模型空间变换到裁剪空间： 12345678910vec4f clip_coords[3];vec3f world_coords[3];float w_values[3];for (int j = 0; j &lt; 3; j++) &#123; world_coords[j] = vertices[face[j]]; vec4f v(world_coords[j], 1.0f); clip_coords[j] = mvp * v; w_values[j] = clip_coords[j].w;&#125; 使用齐次坐标（w&#x3D;1）进行矩阵乘法。 存储 w 值，用于后续透视除法和校正。 2.2 简单裁剪检查丢弃完全在近裁剪面外的三角形： 12345if (clip_coords[0].z &lt; -w_values[0] &amp;&amp; clip_coords[1].z &lt; -w_values[1] &amp;&amp; clip_coords[2].z &lt; -w_values[2]) &#123; continue;&#125; 2.3 透视除法与视口变换将裁剪空间坐标转换为 NDC（标准化设备坐标），并映射到屏幕空间： 12345678910111213141516Vertex vertices[3];for (int j = 0; j &lt; 3; j++) &#123; if (w_values[j] &lt;= 0) continue; float invW = 1.0f / w_values[j]; vec3f ndc( clip_coords[j].x * invW, clip_coords[j].y * invW, clip_coords[j].z * invW ); vertices[j].x = (ndc.x + 1.0f) * fb.width * 0.5f; vertices[j].y = (ndc.y + 1.0f) * fb.height * 0.5f; // ... 深度映射 ... vertices[j].u = tex_coords[j].x * invW; vertices[j].v = tex_coords[j].y * invW; vertices[j].w = invW;&#125; 透视除法：除以 w 得到 NDC。 视口变换：将 [-1,1] 范围映射到屏幕坐标。 步骤 3：深度处理优化3.1 深度值映射将视空间的 z 值映射到 [0,1] 范围，靠近相机为 0，远离为 1： 123456float zEye = clip_coords[j].z;if (w_values[j] != 0) &#123; vertices[j].z = (1.0f - (near * far / zEye * invW + near) / (far - near)) * 0.5f + 0.5f;&#125; else &#123; vertices[j].z = 1.0f;&#125; 使用非线性映射，确保透视效果下的深度分布正确。 反转逻辑，使更近的点得到更小的深度值。 3.2 深度测试调整在 src/core/framebuffer.cpp 中，将深度测试改为 “小于” 测试： 1234if (depth &lt; zBuffer[index]) &#123; // z值越小表示越近 zBuffer[index] = depth; pixels[index] = color;&#125; 初始化时将深度缓冲区清为最大值： 1234Framebuffer::Framebuffer(int w, int h) : width(w), height(h), pixels(w * h), zBuffer(w * h, std::numeric_limits&lt;float&gt;::max()) &#123;&#125;void Framebuffer::clearZBuffer() &#123; std::fill(zBuffer.begin(), zBuffer.end(), std::numeric_limits&lt;float&gt;::max());&#125; 步骤 4：透视校正插值在 drawScanlines 中添加透视校正插值，确保纹理随深度正确变化： 12345678910float wa = interpolate&lt;float, int&gt;(vStartA.w, vStartA.y, vEndA.w, vEndA.y, y);float wb = interpolate&lt;float, int&gt;(vStartB.w, vStartB.y, vEndB.w, vEndB.y, y);// ...float w = wa + (wb - wa) * t;if (useTexture &amp;&amp; w != 0) &#123; float invW = 1.0f / w; float u = (ua + (ub - ua) * t) * invW; float v = (va + (vb - va) * t) * invW; finalColor = texture.sample(u, v) * color;&#125; 插值 1&#x2F;w 而不是直接插值纹理坐标。 在最终采样前除以 w，实现透视校正。 成果与反思通过以上步骤，我们实现了： 透视投影：物体随距离变小。 深度测试：靠近相机的物体遮挡远处的物体。 纹理校正：纹理随视角正确变形。 然而，这仍是一个简化实现。未来的改进可以包括： 更复杂的裁剪算法（处理跨越裁剪面的三角形）。 支持透视投影下的背面剔除。 优化性能（如 SIMD 加速）。 代码已成功渲染出带有透视效果的非洲人头模型，保存为 output.tga。这是一个软光栅化学习过程中的重要里程碑！","tags":["C++","Rendering","Perspective Projection"],"categories":["Computer Graphics","技术分享"]},{"title":"摄像机的实现","path":"/2025/04/05/CAMERA/","content":"概述本文档详细记录了在 SoftRasterizer 项目中实现 Camera 类的全过程，包括其设计、核心代码、应用方式以及验证方法。通过该类，我们实现了灵活的相机控制，支持 OpenGL 风格的视图变换。 核心实现1. Camera 类定义1234567891011121314151617181920// camera.h#pragma once#include &quot;math/matrix.h&quot;#include &quot;math/vector.h&quot;class Camera &#123;public: Camera(const vec3f&amp; position, const vec3f&amp; target, const vec3f&amp; up); void setPerspective(float fovDegrees, float aspectRatio, float near, float far); mat4 getMVP(const mat4&amp; modelMatrix) const; void setPosition(const vec3f&amp; position);private: vec3f m_position; // 相机位置 vec3f m_target; // 目标点 vec3f m_up; // 上方向 mat4 m_viewMatrix; // 视图矩阵 mat4 m_projMatrix; // 投影矩阵 void updateViewMatrix(); // 更新视图矩阵&#125;; 2. Camera 类实现123456789101112131415161718192021222324252627282930313233343536// camera.cpp#include &quot;core/camera.h&quot;Camera::Camera(const vec3f&amp; position, const vec3f&amp; target, const vec3f&amp; up) : m_position(position), m_target(target), m_up(up) &#123; updateViewMatrix(); m_projMatrix = mat4::identity();&#125;void Camera::setPerspective(float fovDegrees, float aspectRatio, float near, float far) &#123; m_projMatrix = mat4::perspective(fovDegrees * 3.1415926f / 180.0f, aspectRatio, near, far);&#125;mat4 Camera::getMVP(const mat4&amp; modelMatrix) const &#123; return m_projMatrix * m_viewMatrix * modelMatrix;&#125;void Camera::setPosition(const vec3f&amp; position) &#123; m_position = position; updateViewMatrix();&#125;void Camera::updateViewMatrix() &#123; vec3f forward = (m_target - m_position).normalized(); vec3f right = forward.cross(m_up).normalized(); vec3f up = right.cross(forward).normalized(); mat4 rotation; rotation.m[0][0] = right.x; rotation.m[0][1] = right.y; rotation.m[0][2] = right.z; rotation.m[0][3] = 0; rotation.m[1][0] = up.x; rotation.m[1][1] = up.y; rotation.m[1][2] = up.z; rotation.m[1][3] = 0; rotation.m[2][0] = -forward.x; rotation.m[2][1] = -forward.y; rotation.m[2][2] = -forward.z; rotation.m[2][3] = 0; rotation.m[3][0] = 0; rotation.m[3][1] = 0; rotation.m[3][2] = 0; rotation.m[3][3] = 1; mat4 translation = mat4::translation(-m_position.x, -m_position.y, -m_position.z); m_viewMatrix = rotation * translation;&#125; 3. 主循环集成12345678910111213141516171819202122232425262728293031// main.cppint main() &#123; const int width = 800, height = 800; Framebuffer framebuffer(width, height); framebuffer.clear(vec3f(0.5f, 0.5f, 0.5f)); framebuffer.clearZBuffer(); Model model; if (!model.loadFromObj(&quot;resources/obj/african_head.obj&quot;) || !model.loadDiffuseTexture(&quot;resources/diffuse/african_head_diffuse.tga&quot;)) &#123; std::cerr &lt;&lt; &quot;Failed to load model or texture&quot; &lt;&lt; std::endl; return 1; &#125; float near = 0.1f, far = 100.0f; Camera camera(vec3f(0, 0, 3), vec3f(0, 0, 0), vec3f(0, 1, 0)); camera.setPerspective(45.0f, (float)width / height, near, far); mat4 modelMatrix = mat4::identity(); mat4 mvp = camera.getMVP(modelMatrix); model.renderSolid(framebuffer, near, far, mvp, vec3f(1.0f, 1.0f, 1.0f), vec3f(0.0f, 0.0f, -1.0f)); framebuffer.flipVertical(); if (!framebuffer.saveToTGA(&quot;output.tga&quot;)) &#123; std::cerr &lt;&lt; &quot;Failed to save image&quot; &lt;&lt; std::endl; return 1; &#125; std::cout &lt;&lt; &quot;Rendered image saved to output.tga&quot; &lt;&lt; std::endl; return 0;&#125; 技术要点 坐标系：采用右手坐标系，+Z 为屏幕外，相机默认朝向由目标点决定。 视图矩阵：通过 lookAt 方法生成，先平移到相机原点，再旋转到相机坐标系。 退化处理：当 forward 和 up 平行时，需调整 up（如从 +Y 看 -Y 时用 -Z）。 投影矩阵：支持透视投影，FOV 转换为弧度，确保与 OpenGL 一致。 应用与验证应用场景 正面视角：相机位于 (0, 0, 3)，朝向 (0, 0, 0)，光照从 +Z 到 -Z，看到 african_head.obj 正面。 灵活调整：通过 setPosition 和目标点调整相机位置和朝向。 验证方法 正面验证： 配置：Camera(vec3f(0, 0, 3), vec3f(0, 0, 0), vec3f(0, 1, 0)) 光照：(0, 0, -1) 预期：看到模型正面。 背面验证： 配置：Camera(vec3f(0, 0, -3), vec3f(0, 0, 0), vec3f(0, 1, 0)) 光照：(0, 0, 1) 预期：看到模型背面。 侧面验证： 配置：Camera(vec3f(3, 0, 0), vec3f(0, 0, 0), vec3f(0, 1, 0)) 光照：(-1, 0, 0) 预期：看到模型右侧。 顶部验证： 配置：Camera(vec3f(0, 3, 0), vec3f(0, 0, 0), vec3f(0, 0, -1)) 光照：(0, -1, 0) 预期：看到模型顶部，无退化。 左前方验证： 配置：Camera(vec3f(-2, 0, 3), vec3f(0, 0, 0), vec3f(0, 1, 0)) 光照：(0.707, 0, -0.707) 预期：看到模型左前方。 总结通过实现 Camera 类，我们成功支持了灵活的相机控制，能够正确渲染 african_head.obj 的各个角度。验证过程确认了视图矩阵、光照和坐标系的一致性，确保了渲染结果符合预期。","tags":["Rendering","Camera System","View Matrix"],"categories":["Computer Graphics","技术分享"]},{"title":"漫反射材质","path":"/2025/04/04/DIFFUSE-TEXTURE/","content":"概述本文档详细记录了在 SoftRasterizer 项目中实现 diffuse 材质加载的全过程，解决了初始加载失败的问题，使得模型能够正确显示纹理效果。 核心修改1. Texture 类扩展1234567891011121314151617181920212223242526272829303132333435// 定义 Texture 类支持 TGA 加载class Texture &#123;public: int width = 0; int height = 0; std::vector&lt;vec3f&gt; pixels; bool loadFromTGA(const std::string&amp; filename); vec3f sample(float u, float v) const; bool empty() const &#123; return pixels.empty() || width == 0 || height == 0; &#125;&#125;;// 实现 TGA 文件加载bool Texture::loadFromTGA(const std::string&amp; filename) &#123; std::vector&lt;unsigned char&gt; data; if (!loadTGA(filename, width, height, data)) &#123; return false; &#125; pixels.resize(width * height); for (int y = 0; y &lt; height; y++) &#123; for (int x = 0; x &lt; width; x++) &#123; int idx = (y * width + x) * 3; pixels[y * width + x] = vec3f( data[idx] / 255.0f, // R data[idx + 1] / 255.0f, // G data[idx + 2] / 255.0f // B ); &#125; &#125; return true;&#125; 2. 支持 RLE 压缩的 TGA 加载12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758bool loadTGA(const std::string&amp; filename, int&amp; width, int&amp; height, std::vector&lt;unsigned char&gt;&amp; data) &#123; std::ifstream file(filename, std::ios::binary); if (!file.is_open()) return false; TGAHeader header; file.read(reinterpret_cast&lt;char*&gt;(&amp;header), sizeof(header)); // 支持未压缩 (2) 和 RLE 压缩 (10) 的 24 位 RGB 图像 if ((header.datatypecode != 2 &amp;&amp; header.datatypecode != 10) || header.bitsperpixel != 24) &#123; std::cerr &lt;&lt; &quot;Unsupported TGA format&quot; &lt;&lt; std::endl; return false; &#125; width = header.width; height = header.height; data.resize(width * height * 3); file.seekg(header.idlength + header.colormaplength * (header.colormapdepth / 8), std::ios::cur); if (header.datatypecode == 2) &#123; file.read(reinterpret_cast&lt;char*&gt;(data.data()), data.size()); &#125; else if (header.datatypecode == 10) &#123; size_t pixelCount = width * height; size_t currentPixel = 0; unsigned char pixel[3]; while (currentPixel &lt; pixelCount) &#123; unsigned char chunkHeader; file.read(reinterpret_cast&lt;char*&gt;(&amp;chunkHeader), 1); if (chunkHeader &lt; 128) &#123; // Raw packet size_t count = chunkHeader + 1; for (size_t i = 0; i &lt; count &amp;&amp; currentPixel &lt; pixelCount; ++i) &#123; file.read(reinterpret_cast&lt;char*&gt;(pixel), 3); data[currentPixel * 3] = pixel[0]; data[currentPixel * 3 + 1] = pixel[1]; data[currentPixel * 3 + 2] = pixel[2]; currentPixel++; &#125; &#125; else &#123; // RLE packet size_t count = chunkHeader - 127; file.read(reinterpret_cast&lt;char*&gt;(pixel), 3); for (size_t i = 0; i &lt; count &amp;&amp; currentPixel &lt; pixelCount; ++i) &#123; data[currentPixel * 3] = pixel[0]; data[currentPixel * 3 + 1] = pixel[1]; data[currentPixel * 3 + 2] = pixel[2]; currentPixel++; &#125; &#125; &#125; &#125; // BGR 转 RGB for (size_t i = 0; i &lt; data.size(); i += 3) &#123; std::swap(data[i], data[i + 2]); &#125; return true;&#125; 3. Model 类集成123456789101112class Model &#123;public: Texture diffuseTexture; bool loadDiffuseTexture(const std::string&amp; filename) &#123; return diffuseTexture.loadFromTGA(filename); &#125; void renderSolid(Framebuffer&amp; fb, const vec3f&amp; lightDir, const vec3f&amp; eye) &#123; // 使用 diffuseTexture 进行渲染... &#125;&#125;; 4. 主程序调整12345678910111213141516171819202122int main() &#123; Framebuffer framebuffer(800, 800); framebuffer.clear(vec3f(0.5f, 0.5f, 0.5f)); framebuffer.clearZBuffer(); Model model; if (!model.loadFromObj(&quot;resources/obj/african_head.obj&quot;)) &#123; std::cerr &lt;&lt; &quot;Failed to load model&quot; &lt;&lt; std::endl; return 1; &#125; if (!model.loadDiffuseTexture(&quot;resources/diffuse/african_head_diffuse.tga&quot;)) &#123; std::cerr &lt;&lt; &quot;Failed to load texture&quot; &lt;&lt; std::endl; return 1; &#125; model.renderSolid(framebuffer, vec3f(1.0f, 1.0f, 1.0f), vec3f(0.0f, 0.0f, 1.0f)); framebuffer.flipVertical(); framebuffer.saveToTGA(&quot;output.tga&quot;); return 0;&#125; 技术要点 TGA 格式支持：扩展 loadTGA 函数，支持 datatypecode &#x3D;&#x3D; 2（未压缩 RGB）和 datatypecode &#x3D;&#x3D; 10（RLE 压缩 RGB）。 RLE 解码：实现 RLE 压缩的解码逻辑，处理 raw 和 RLE 数据包。 颜色转换：将 TGA 文件的 BGR 格式转换为 RGB 格式。 错误处理：添加详细的调试输出，确保加载失败时能定位问题。 验证方法 检查输出图像 output.tga，确认模型表面显示正确的 diffuse 纹理。 验证纹理坐标 (u, v) 的插值是否正确，纹理无拉伸或错位。 确保 RLE 压缩的 TGA 文件能够正常加载并渲染。 检查程序运行时无 “Failed to load texture” 错误输出。 修改过程回顾最初，程序因 “Failed to load texture” 而失败，原因是 african_head_diffuse.tga 文件使用了 RLE 压缩（datatypecode &#x3D;&#x3D; 10），而原始代码只支持未压缩格式（datatypecode &#x3D;&#x3D; 2）。通过调试输出确认问题后，我扩展了 loadTGA 函数，添加了对 RLE 压缩的支持，最终成功加载并渲染了 diffuse 材质。","tags":["Rendering","Texture Mapping"],"categories":["Computer Graphics","技术分享"]},{"title":"Z-Buffer 深度缓冲的实现","path":"/2025/04/04/Z-BUFFER-IMPLEMENTATION/","content":"概述本文档详细记录了在SoftRasterizer项目中实现Z-Buffer深度测试的全过程。 核心修改1. 帧缓冲类改造1234567891011121314// 添加深度缓冲区std::vector&lt;float&gt; zBuffer;// 初始化Framebuffer::Framebuffer(int w, int h) : width(w), height(h), pixels(w * h), zBuffer(w * h, std::numeric_limits&lt;float&gt;::lowest()) &#123;&#125;// 清空深度缓冲void clearZBuffer() &#123; std::fill(zBuffer.begin(), zBuffer.end(), std::numeric_limits&lt;float&gt;::lowest());&#125; 2. 深度测试实现12345678910void setPixel(int x, int y, const vec3f&amp; color, float depth) &#123; if (x &gt;= 0 &amp;&amp; x &lt; width &amp;&amp; y &gt;= 0 &amp;&amp; y &lt; height) &#123; int index = y * width + x; // 右手坐标系：z值越大表示越远 if (depth &gt; zBuffer[index]) &#123; zBuffer[index] = depth; pixels[index] = color; &#125; &#125;&#125; 3. 三角形渲染优化12345678910111213141516void drawTriangle(/* 参数 */) &#123; // 顶点排序和退化检测... // 顶部渲染 for (int y = y0; y &lt;= y1; y++) &#123; // 边界检查 if (y &lt; 0 || y &gt;= height) continue; // 坐标插值 int xa = interpolate(x0, y0, x2, y2, y); int xb = interpolate(x0, y0, x1, y1, y); float za = (y2 != y0) ? z0 + (z2 - z0) * (y - y0) / (y2 - y0) : z0; // ...其余代码 &#125; // 底部渲染类似...&#125; 4. 主循环集成123// 每帧清空framebuffer.clear(vec3f(0.1f, 0.1f, 0.1f));framebuffer.clearZBuffer(); 技术要点 坐标系：采用右手坐标系，+Z指向观察者后方 深度比较：使用&gt;运算符进行深度测试 初始值：使用lowest()而非min() 边界处理：完善的越界检查和除零保护 验证方法 近处物体正确遮挡远处物体 无三角形破碎现象 表面深度过渡平滑 无闪烁或Z-fighting现象","tags":["Rendering","Depth Buffer"],"categories":["Computer Graphics"]},{"title":"OBJ模型加载与三角形渲染实现","path":"/2025/04/03/OBJ-RENDERING-IMPLEMENTATION/","content":"OBJ模型加载与三角形渲染实现坐标系确定本渲染器使用左手坐标系，判断依据： 静态分析方法 检查顶点变换： 123// 没有Z轴反转操作，保持原始方向screen_coords[j] = vec2i((v.x+1)*fb.width/2, (v.y+1)*fb.height/2);// Z值保持不变，直接用于深度比较 检查法线计算： 12vec3f normal = (v2-v0).cross(v1-v0).normalized();// 叉乘顺序决定法线方向，与左手系一致 检查光照计算： 12float intensity = normal.dot(lightDir.normalized());// 当lightDir=(0,0,1)时，朝前的面(intensity&gt;0)会被渲染 动态验证方法 创建测试三角形： 1vertices = &#123;&#123;0,1,0&#125;, &#123;-1,-1,0&#125;, &#123;1,-1,0&#125;&#125;; // 朝向+z 观察不同光照方向效果： lightDir(0,0,1) 应可见 lightDir(0,0,-1) 应不可见 实现概述本次实现了OBJ模型加载和三角形渲染功能，主要包含： OBJ文件格式解析 三角形面片渲染 基础光照计算 背面剔除优化 核心实现1. OBJ文件加载123456789101112131415161718bool Model::loadFromObj(const std::string&amp; filename) &#123; // 解析顶点数据 if (type == &quot;v&quot;) &#123; vec3f v; iss &gt;&gt; v.x &gt;&gt; v.y &gt;&gt; v.z; vertices.push_back(v); &#125; // 解析面数据 else if (type == &quot;f&quot;) &#123; // 处理v/vt/vn等多种格式 while (iss &gt;&gt; v) &#123; face.push_back(v - 1); // OBJ使用1-based索引 if (iss.peek() == &#x27;/&#x27;) &#123; // 处理纹理/法线坐标... &#125; &#125; &#125;&#125; 2. 三角形渲染与光照1234567891011121314void Model::renderSolid(Framebuffer&amp; fb, const vec3f&amp; color, const vec3f&amp; lightDir) &#123; // 计算面法线 vec3f normal = calculateFaceNormal(face); // 光照计算（Lambert模型） float intensity = normal.dot(lightDir.normalized()); if (intensity &gt; 0) &#123; // 背面剔除 vec3f shadedColor = color * intensity; // 三角形光栅化 fb.drawTriangle(x0,y0, x1,y1, x2,y2, shadedColor); &#125;&#125; 3. 法线计算12345vec3f Model::calculateFaceNormal() const &#123; vec3f edge1 = v1 - v0; vec3f edge2 = v2 - v0; return edge1.cross(edge2).normalized();&#125; 关键技术点 OBJ格式解析： 支持顶点&#x2F;纹理&#x2F;法线坐标 处理多种面定义格式(v, v&#x2F;vt, v&#x2F;&#x2F;vn, v&#x2F;vt&#x2F;vn) 1-based到0-based索引转换 渲染优化： 背面剔除：跳过dot product ≤ 0的面 扫描线算法：高效三角形填充 法线插值：使用顶点法线或几何法线 光照模型： 简单Lambert漫反射 光线方向归一化处理 颜色强度线性缩放 使用方法12345678Model model;model.loadFromObj(&quot;model.obj&quot;);// 设置光照方向(指向屏幕里)vec3f lightDir(0,0,1); // 渲染模型(白色)model.renderSolid(fb, vec3f(1,1,1), lightDir); 效果验证渲染测试模型后应得到： 正确朝向的面片被渲染 背对光源的面片被剔除 光照强度随角度变化 后续计划 实现Z-buffer深度测试 添加纹理映射支持 实现Phong光照模型","tags":["计算机图形学","C++","模型渲染"],"categories":["Computer Graphics","技术分享"]},{"title":"软光栅渲染器开发记录","path":"/2025/04/01/SOFTRASTERIZER-INTRODUCTION/","content":"软光栅渲染器开发阶段性成果项目概述我们实现了一个基础的软光栅渲染器，具有以下特点： 完全从零实现，不依赖图形API 仅使用标准库和基础数学运算 支持基本的像素绘制和图像输出 核心功能实现1. 数学库123456789101112// 向量模板类template&lt;typename T&gt;struct Vector3 &#123; T x, y, z; // 向量运算...&#125;;// 4x4矩阵struct mat4 &#123; float m[4][4]; // 矩阵运算和变换...&#125;; 2. 帧缓冲管理12345678class Framebuffer &#123; int width, height; std::vector&lt;vec3f&gt; pixels; public: // 清屏、像素绘制等操作... bool saveToTGA(const std::string&amp; filename);&#125;; 3. TGA图像输出实现了Truevision TGA格式的图像输出： 支持24位RGB格式 包含完整的文件头结构 像素数据BGR排列 项目结构12345678SoftRasterizer/├── include/│ ├── math/ # 数学库│ └── core/ # 核心渲染组件├── src/│ ├── io/ # 文件IO│ └── core/ # 实现代码└── CMakeLists.txt # 构建配置 使用方法 构建项目： 12cmake -S . -B buildcmake --build build --config Release 运行程序： 1./build/Release/SoftRasterizer.exe 后续计划 实现OBJ模型加载 添加三角形光栅化 支持深度缓冲(Z-buffer) 实现基础光照模型 查看完整代码","tags":["计算机图形学","C++","渲染引擎"],"categories":["技术分享"]},{"title":"软光栅直线绘制算法实现","path":"/2025/04/01/LINE-DRAWING-ALGORITHM/","content":"直线光栅化基础算法 - Bresenham实现算法简介Bresenham算法是计算机图形学中最基础的直线光栅化算法，通过整数运算高效确定最佳逼近直线路径的像素点。 核心特点 完全整数运算，无浮点计算 避免乘除法，仅用加减和位运算 一次生成一个像素，时间复杂度O(n) 实现原理基本思想算法通过误差项决定下一个像素的选择： 以x为步进方向 计算Δy&#x2F;Δx的斜率 维护误差项跟踪实际直线与像素中心的距离 根据误差决定是否增加&#x2F;decrease y 关键优化1234567bool steep = abs(y1 - y0) &gt; abs(x1 - x0); // 是否为陡峭线if (steep) std::swap(x0, y0); // 统一处理为缓变线if (x0 &gt; x1) std::swap(x0, x1); // 确保从左到右绘制int dx = x1 - x0;int dy = abs(y1 - y0);int err = dx / 2; // 初始误差 接口实现添加到Framebuffer类： 12345class Framebuffer &#123;public: // ... void drawLine(int x0, int y0, int x1, int y1, const vec3f&amp; color);&#125;; 测试用例测试不同方向的直线绘制： 12345678// 水平线（红色）framebuffer.drawLine(100, 100, 700, 100, vec3f(1,0,0));// 垂直线（蓝色） framebuffer.drawLine(400, 100, 400, 500, vec3f(0,0,1));// 对角线（绿色）framebuffer.drawLine(100, 150, 700, 500, vec3f(0,1,0)); 效果验证生成图像应包含： 正确朝向的3D模型线框 所有边线完整连接 无断裂或缺失像素 坐标系统说明模型渲染时进行了坐标转换： X坐标：保持原样 (x1 &#x3D; (v1.x + 1) * width &#x2F; 2) Y坐标：翻转以符合屏幕坐标系 (y1 &#x3D; height - (v1.y + 1) * height &#x2F; 2) Z坐标：暂时忽略 继续阅读 Bresenham原始论文 算法优化技巧 返回项目主页","tags":["计算机图形学","C++","渲染引擎"],"categories":["Computer Graphics","技术分享"]},{"title":"计算机图形学——第2章：图形系统","path":"/2025/03/22/计算机图形学——第2章：图形系统/","content":"计算机图形学：第2章 图形系统使用计算机进行图形处理时，需要有一个由硬件和软件组成的计算机图形系统，也就是我们所说的支撑环境。本章主要讨论计算机图形系统完成图形显示任务的原理和方式，并且对图形系统所涉及的主要软件和硬件进行必要的介绍。最后对图形流水线进行介绍和分析。 2.1 图形系统概述 2.1.1 图形硬件图形显示设备用于观察，修改图形，它是人机交互式处理图形的有力工具。 图形绘制设备是用于输出图形到介质的设备。可分为光栅点阵型（打印机）和随机矢量型（笔试绘图仪）。 2.1.2 图形软件广义上的图形程序。可分为图形应用软件、图形支撑软件和图形应用数据结构3部分。 若以Pascal语言之父提出的公式“程序&#x3D;算法+数据结构”来类比，则有 图形程序=图形算法+图形应用数据结构 2.2 图形硬件2.2.1 图形显示设备 阴极射线管 液晶显示器 2.2.2 图形显示方式 随机扫描显示 光栅扫描显示 2.2.3 光栅扫描显示系统在此系统中，电子束横向扫描屏幕，从左到右，从上到下，一次一行顺次进行。当电子束横向沿每一行移动时，电子束的强度不断变化来建立亮点的图案，构成图像并显示在屏幕上。 光栅扫描显示系统的组成3部分： 显示器、视频控制器和帧缓冲存储器。其中，显示器屏幕图形是依靠帧缓冲进行刷新的，而视频控制器是负责刷新的部件。目前常见的光栅显示器主要有彩色阴极射线管与液晶显示器两种。 光栅扫描显示系统的结构 2.2.4 显卡和图形处理器 显卡显卡(Video Card, Graphics Card)又称显示接口卡，也称显示适配器。它是主机与显卡之间的桥梁，控制计算机图形输出，负责将CPU送来的图像数据处理成显示器接受的格式，再送到显示器形成图像。显卡各部分组成及其与周边设备的关系如图所示： 2.3 图形软件2.4 图形流水线2.4.1 图形流水线三阶段 应用程序阶段一般将数据以图元的形式提供给图形硬件，如用来描述三维几何模型的点、线或多边形。同时也提供用于表面纹理映射的图像或位图。 几何处理阶段是以每个顶点为基础对几何图元进行处理，并从三维坐标变换到二维屏幕坐标的过程。该阶段在GPU上进行。目标是确定哪些几何图像可以在屏幕上显示，并把颜色值赋给这些对象的顶点。可以进一步划分为顶点变换、投影、裁剪、顶点着色等阶段。 光栅阶段，屏幕对象首先被传送到像素处理器进行光栅化，并对每个像素进行着色，然后输出到显示器。目的就是给像素准确配色，正确绘制整幅图像。此过程称为光栅化或扫描转换。 2.4.2 图形流水线关键步骤"},{"title":"计算机图形学——第1章：绪论","path":"/2025/03/22/计算机图形学——第1章：绪论/","content":"计算机图形学：第1章 绪论 “图形是人类与计算机对话的窗口，而计算机图形学则是打开这扇窗的钥匙。” 计算机图形学（Computer Graphics）是一门研究如何利用计算机生成、处理和显示图形的学科。它不仅是计算机科学的重要分支，还融合了数学、物理学和艺术的精髓。从最初的简单线框图到如今的实时光线追踪，计算机图形学已经深刻改变了我们的生活方式。 本章将带你走进计算机图形学的世界，探索其定义、内涵以及发展历程。我们将从 4W 问题（What, Why, Where, When）入手，逐步揭开图形学的神秘面纱。 1.1 计算机图形学的定义与内涵在学习计算机图形学之前，我们需要明确它的定义和研究对象。简单来说，计算机图形学是一门研究如何通过计算机生成和处理图形的学科。它不仅关注图像的生成，还涉及如何让这些图像更逼真、更高效地呈现。 定义计算机图形学的核心在于 “形” 和 “光”： 形：指几何形状的建模与表示，例如如何用数学方法描述一个三维物体。 光：指光照效果的模拟，例如如何通过算法计算光线与物体的交互。 内涵计算机图形学的研究内容可以从以下几个方面展开： 建模：如何用数学方法描述三维物体？ 渲染：如何将三维模型转化为二维图像？ 动画：如何让静态物体动起来？ 交互：如何实现用户与图形的实时互动？ 通过这些研究，计算机图形学能够将抽象的数学模型转化为直观的视觉效果，为用户提供沉浸式的体验。 1.2 图形及其与图像的区别在学习图形学时，我们常常会遇到“图形”和“图像”这两个概念。它们看似相似，但实际上有着本质的区别。 1. 图形我们生活在一个充满图形的现实世界中。无论是自然界的花草树木，还是人造的建筑车辆，这些物体都可以被抽象为 “形”。在计算机图形学中，图形是指通过数学模型描述的几何形状，例如点、线、面等。 特点：图形是 矢量化的，可以无损缩放。 应用：CAD 设计、游戏建模。 2. 图像图像则是对现实世界的采样结果，通常以像素的形式存储。无论是照片、视频还是屏幕上的显示内容，图像都是由像素点组成的。 特点：图像是 光栅化的，缩放可能会失真。 应用：数字摄影、图像处理。 3. 图形与图像的关系 从“形”到“图”：通过渲染技术，图形可以转化为图像。例如，一个三维模型经过光照计算后，生成一张二维图片。 从“图”到“形”：通过逆向工程，图像可以重建为图形。例如，通过图像识别技术提取物体的轮廓。 4. 图形与图像的关系以下表格从多个维度对比 图形 和 图像 的特点： 维度 图形 图像 定义 通过数学模型描述的几何形状，通常以矢量形式存储（如点、线、面）。 对现实世界的采样结果，以像素（光栅化）形式存储（如照片）。 存储方式 以数学公式或向量数据存储（如 SVG 文件），文件体积小。 以像素网格存储（如 PNG、JPEG 文件），文件体积较大。 缩放效果 支持无损缩放，放大后不会失真。 缩放可能导致失真，放大后会出现像素化（锯齿）。 生成方式 通过算法生成，通常由建模和渲染技术创建。 通过设备采样（如相机拍摄）或渲染图形后生成。 编辑方式 直接修改几何属性（如调整坐标、形状），编辑灵活。 通过图像处理软件（如 Photoshop）编辑像素，修改复杂。 应用场景 CAD 设计、游戏 3D 模型、矢量插图等需要精确建模的领域。 数字摄影、视频帧、网页图片等视觉呈现场景。 与计算机图形学的关系 核心研究对象，关注生成和操作几何形状。 图形的渲染结果，通过渲染技术由图形转化而来。 转化关系 可通过渲染技术（如光栅化）转化为图像。 可通过逆向工程（如图像识别）提取图形信息。 5. 小结 图形 更注重数学描述和可编辑性，是计算机图形学的起点。 图像 更注重视觉呈现和直观性，是图形的最终输出形式。 1.3 计算机图形学的 4W 问题为了更全面地理解计算机图形学，我们可以从以下 4W 问题入手： What（是什么）计算机图形学是一门研究如何生成和处理图形的学科，核心在于“形”与“光”的结合。 Why（为什么学）图形学不仅是技术，更是艺术与科学的结合。它推动了游戏、电影、虚拟现实等行业的发展，为人类提供了更直观的表达方式。 Where（应用在哪里）计算机图形学的应用无处不在： 娱乐：电影特效（如《阿凡达》）、游戏渲染（如《赛博朋克2077》）。 科学：医学成像、气象模拟。 工业：建筑设计、汽车建模。 When（什么时候学）学习计算机图形学需要一定的数学基础（如线性代数、微积分）和编程能力（如 C++ 或 Python）。建议在掌握这些基础后开始学习。 1.4 本章小结本章作为计算机图形学的开篇，介绍了其定义、内涵以及图形与图像的区别。通过 4W 问题，我们初步了解了图形学的核心内容和应用场景。下一章，我们将深入探讨图形学的基础数学工具，为后续学习打下坚实基础。 “图形学的魅力在于，它不仅让我们看到世界，还让我们创造世界。”","tags":["计算机图形学","基础知识","绪论"]},{"title":"Hello World","path":"/2025/03/20/hello-world/","content":"Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub. Quick StartCreate a new post1$ hexo new &quot;My New Post&quot; More info: Writing Run server1$ hexo server More info: Server Generate static files1$ hexo generate More info: Generating Deploy to remote sites1$ hexo deploy More info: Deployment"}]