16/PCGPE10/TEXTURE.TXT

   1                             ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿\r
   2                             ³ Texture Mapping ³\r
   3                             ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ\r
   4 \r
   5                  Written for the PC-GPE by Sean Barrett.\r
   6 \r
   7                ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿\r
   8                ³      THIS FILE MAY NOT BE DISTRIBUTED     ³\r
   9                ³ SEPARATE TO THE ENTIRE PC-GPE COLLECTION. ³\r
  10                ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ\r
  11 \r
  12 \r
  13 -=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=-\r
  14 TEXTURE\r
  15 TEXUE  almost everything you need to know to texture map with the PC\r
  16 TXR\r
  17 X\r
  18 -=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=-\r
  19 \r
  20 Copyright 1994 Sean Barrett.\r
  21 Not for reproduction (electronic\r
  22 or hardcopy) except for personal use.\r
  23 \r
  24 Contents:\r
  25 \r
  26   0. warnings\r
  27   1. terminology and equations\r
  28   2. the basics\r
  29       Perfect texture mapping\r
  30       DOOM essentials\r
  31       Wraparound textures\r
  32       Non-rectangular polygons\r
  33   3. the hazards\r
  34       going off the texture map\r
  35       divide by 0\r
  36       it'll never be fast enough\r
  37   4. the complexities\r
  38       handling arbitrarily-angled polygons\r
  39       lighting\r
  40       slow 16-bit VGA cards\r
  41       mipmapping\r
  42 \r
  43  -=-=-=-=-=-\r
  44 | Note:  I am not providing any references as I simply\r
  45 |        derived the math myself and worked out the various\r
  46 |        techniques for myself (the 32-bit ADC trick was\r
  47 |        pointed out to me in another context by TJC,\r
  48 |        author of the Mars demo) over the last two years\r
  49 |        (since Wolfenstein 3D and Underworld came out).\r
  50  -=-=-=-=-=-\r
  51 \r
  52 TEXTURE\r
  53 TEXUE\r
  54 TXR   0. Warnings\r
  55 X\r
  56 \r
  57   I assume a RIGHT-handed 3D coordinate system, with X positive\r
  58 to the right, Y positive disappearing into the screen/distance,\r
  59 and Z positive up.  To adjust this to the typical left-handed\r
  60 3D space, simply swap all the 3D Ys & Zs.\r
  61 \r
  62   I assume the screen space is positive-X to the right,\r
  63 positive-Y goes down.  Adjust the signs as appropriate for\r
  64 your system.\r
  65 \r
  66   I will present code and pseudo-code in C.  I also include\r
  67 some relatively tight inner loops written in assembly, but I'm\r
  68 omitting the details of the loop setup.  The inner loops, while\r
  69 usually from real, working code, should generally be taken\r
  70 as examples showing how fast it ought to be possible to run\r
  71 a given task, not as necessarily perfect examples.  I often\r
  72 use 32-bit instructions (sorry 286 programmers) because they\r
  73 can double the performance.  However, I write in real mode,\r
  74 because 16-bit addressing is often convenient for texture maps,\r
  75 and it's straightforward to use segment registers as pointers\r
  76 to texture maps.  The translation to protected mode should not\r
  77 prove problematic, but again, these should more be taken as\r
  78 examples rather than simply being used directly.  I optimize\r
  79 for the 486, but I skip some obvious optimizations.  For example,\r
  80 I write "loop", because it's simpler and more clear.\r
  81 Production code for the 486 should explicitly decrement and\r
  82 then branch.  Similarly, I write "stosb", etc. etc.\r
  83 \r
  84 \r
  85 TEXTURE\r
  86 TEXUE\r
  87 TXR   1. Terminology and Equations\r
  88 X\r
  89 \r
  90 \r
  91   You really probably don't want to read this section first,\r
  92 but rather refer back to it whenever you feel the need.  So\r
  93 skip up to section 2 and refer back as appropriate.  I could've\r
  94 made this an appendix, but it seems too important to put last.\r
  95 \r
  96 TEX\r
  97 TE   Terms\r
  98 T\r
  99     texture:  A texture is a pixelmap of colors which is mapped\r
 100               onto a polygon using "texture mapping".  The size\r
 101               of the polygon has nothing to do with the size (the\r
 102               number of pixels) in the texture map.\r
 103 \r
 104     run:  A run is a row or column of pixels.  Normal texture\r
 105           mapping routines process one run in an "inner loop".\r
 106 \r
 107     arbitrarily-angled polygon:  a polygon which isn't a floor,\r
 108           wall, or ceiling; technically, a polygon which isn't\r
 109           parallel to the X or Z axes (or X or Y axes in a\r
 110           Z-is-depth coordinate system).\r
 111 \r
 112     texture space:\r
 113     polygon space:\r
 114     polygon coordinate space:\r
 115           Since a texture is flat, or two-dimensional, the relation\r
 116           of the texture to the 3D world can be described with a\r
 117           special coordinate space known by one of these names.\r
 118           Because it is only 2D, the space can be characterized with\r
 119           the location of the texture space in 2D, and two 3D vectors\r
 120           which represent the axes of the coordinate space.  Sometimes\r
 121           called "uv" space, because the name of the coordinates are\r
 122           usually u & v.\r
 123 \r
 124 TEX\r
 125 TE   Notation\r
 126 T\r
 127 \r
 128     Vectors appear in all caps.\r
 129     Components of vectors are P = < Px, Py, Pz >.\r
 130 \r
 131     Certain variables have consistent usage:\r
 132 \r
 133       x,y,z are coordinates in three-space\r
 134       i,j are screen coordinates\r
 135       u,v are coordinates in texture space\r
 136       a,b,c are "magic coordinates" such that\r
 137          u = a/c, v = b/c\r
 138 \r
 139 TEX\r
 140 TE   Equations\r
 141 T\r
 142 \r
 143     Don't let this scare you off!  Go read section 2, and\r
 144     come back to this when you're ready.\r
 145 \r
 146     Let P,M, and N be vectors defining the texture space: P is the\r
 147     origin, and M and N are the vectors for the u&v axes.\r
 148 \r
 149     Assume these vectors are in _view space_, where view space is\r
 150     defined as being the space in which the transformation from\r
 151     3D to 2D is:\r
 152 \r
 153       (x,y,z) -> (i,j)\r
 154         i = x / y\r
 155         j = z / y\r
 156 \r
 157     In other words, you have to adjust P, M, and N to be relative\r
 158     to the view, and if you have multiplications in your perspective\r
 159     computation, you have to multiply the appropriate components of\r
 160     P, M, and N to compute them.  Note that since typically in 3D\r
 161     up is positive, and in 2D down is positive, there may be a\r
 162     multiplication by -1 that needs to go into Py, My, Ny.  Note that\r
 163     this also assumes that (0,0) in screen space is at the center\r
 164     of the screen.  Since it's generally not, simply translate\r
 165     your screen coordinates (i,j) as if they were before applying\r
 166     the texture mapping math (or if you're funky you can modify\r
 167     your viewspace to pre-skew them).\r
 168 \r
 169       For example, if your transforms are:\r
 170          i = Hscale * x / y + Hcenter\r
 171          j = -Vscale * z / y + Vcenter\r
 172 \r
 173       Then you should simply multiply Px, Mx, and Nx by Hscale,\r
 174       and multiply Py, My, and Mz by -Vscale.  Then just remember\r
 175       to subtract Hcenter and Vcenter from the i,j values right\r
 176       before plugging them into the texture mapping equations.\r
 177 \r
 178     We begin by computing 9 numbers which are constant\r
 179     for texture mapping the entire polygon (O stands for\r
 180     Origin, H for Horizontal, and V for vertical; why I use\r
 181     these names should become clear eventually):\r
 182 \r
 183          Oa = Nx*Pz - Nz*Px\r
 184          Ha = Nz*Py - Ny*Pz\r
 185          Va = Ny*Px - Nx*Py\r
 186 \r
 187          Ob = Mx*Pz - Mz*Px\r
 188          Hb = Mz*Py - My*Pz\r
 189          Vb = My*Px - Mx*Py\r
 190 \r
 191          Oc = Mz*Nx - Mx*Nz\r
 192          Hc = My*Nz - Mz*Ny\r
 193          Vc = Mx*Ny - My*Nx\r
 194 \r
 195     Ok.  Then, for a given screen location (i,j), the formula\r
 196     for the texture space (u,v) coordinates corresponding to\r
 197     it is:\r
 198 \r
 199          a = Oa + i*Ha + j*Va\r
 200          b = Ob + i*Hb + j*Vb\r
 201          c = Oc + i*Hc + j*Hc\r
 202 \r
 203          u = a/c\r
 204          v = b/c\r
 205 \r
 206 \r
 207 TEXTURE\r
 208 TEXUE\r
 209 TXR   2.  The Basics\r
 210 X\r
 211 \r
 212     So you've got your polygon 3D engine running, and\r
 213 you'd like to start adding a bit of texture to your\r
 214 flat- or Gouraud-shaded polygons.  Well, it will make\r
 215 it look a lot cooler.  But let's point out the\r
 216 disadvantages of texture mapping right away:\r
 217 \r
 218       Slower\r
 219       Sometimes hard to see polygon edges\r
 220 \r
 221     Each of these has certain ramifications on the\r
 222 overall approach you want to take with your code,\r
 223 which we'll come back to later, in sections 3 and 4.\r
 224 \r
 225     Practical advice: Don't try to get your riproaringly\r
 226 fast texture mapper running first.  Get a very simple,\r
 227 slow, "perfect" texture mapper working first, as described\r
 228 in the first subsection below.  This will allow you to make\r
 229 sure you've gotten the equations right.  Realize that I\r
 230 can't present the equations appropriate to every scenario,\r
 231 since there are simply too many spaces people can work\r
 232 in.  I've chosen to present the math from an extremely\r
 233 simple coordinate space which keeps the texture mapping\r
 234 relatively simple.  You'll have to work out the correct\r
 235 transformations to make it work right, and a slow but\r
 236 correct texture mapping routine may help you tweak the\r
 237 code as necessary to achieve this.  Use very simple\r
 238 polygons to start your testing; centered and facing the\r
 239 viewer should be your very first one (if done correctly,\r
 240 this will simply scale the texture).\r
 241 \r
 242 TEX\r
 243 TE   Perfect Texture Mapping\r
 244 T\r
 245 \r
 246     To start with, we'll do slow but exact "perfect" texture\r
 247 mapping of a square tile with a simple texture map mapped onto\r
 248 it.  The polygon is defined in three-space using four points,\r
 249 and the texture map is 256x256 pixels.  Note that this is all\r
 250 talking about using floating point, so those of you working in\r
 251 C or Pascal are fine.  Those in assembly should realize that\r
 252 you have to do a bit of extra work to use fixed point, or you\r
 253 can beat out the floating point by hand if you want.\r
 254 \r
 255     First we have to "map" the texture onto the polygon.\r
 256 We have to define how the square texture map corresponds\r
 257 to the square polygon.  This is relatively simple.  Let\r
 258 one corner of the polygon be the origin (location <0,0>)\r
 259 of the texture map.  Let each of the other corners\r
 260 correspond to corners just off the edge of the texture\r
 261 map (locations <256, 0>, <256, 256>, and <0, 256>).\r
 262 \r
 263     We'd like to use the equations in section 1, which\r
 264 require vectors P, M, and N, where P is the origin,\r
 265 and M & N are the axes for u&v (which are _roughly_\r
 266 the coordinates in the texture map, but see below).\r
 267 In other words, P, M and N tells us where the texture\r
 268 lies relative to the polygon.  P is the coordinate in\r
 269 three-space where the origin of the texture is.  M\r
 270 tells us which way the "horizontal" dimension of the\r
 271 texture lies in three-space, and N the "vertical".\r
 272 \r
 273     Suppose the polygon has four vertices V[0], V[1],\r
 274 V[2], and V[3] (all four of these are vectors).  Then,\r
 275 P is simply V[0].  M is a vector going from the origin\r
 276 to the corner <256, 0>, so M is a vector from V[0] to\r
 277 V[1], so M = V[1] - V[0].  N is a vector from the origin\r
 278 to the corner <0, 256>, so N is V[3] - v[0].\r
 279 \r
 280     P = V[0]\r
 281     M = V[1] - V[0]    { note these are vector subtractions }\r
 282     N = V[3] - V[0]\r
 283 \r
 284     Again, remember that we need P, M, and N in the viewspace\r
 285 discussed with the equation, so make sure you've transformed\r
 286 the Vs appropriately, or you can compute P, M, and N in world\r
 287 or object space and transform them into viewspace.\r
 288 \r
 289     Compute the 9 magic numbers (vectors O, H, and V) as described\r
 290 in section 1.  Now, take your 3D polygon and process it as normal.\r
 291 Scan convert it so that you have a collection of rows of pixels\r
 292 to process.\r
 293 \r
 294     Now, iterate across each row.  For each pixel in the polygon\r
 295 whose screen coordinates are <i,j>, apply the rest of the math\r
 296 described in section 1; that is, compute a, b, and c, and from\r
 297 them compute <u,v>.\r
 298 \r
 299     I said before that <u,v> are basically the texture map\r
 300 coordinates.  What they are in truth are the coordinates in texture\r
 301 map space.  Because of the way we defined texture map space,\r
 302 we'll actually find that u and v both run from 0..1, not 0..256.\r
 303 This is an advantage for descriptive purposes because u and v\r
 304 are always 0 to 1, regardless of the size of the texture map.\r
 305 \r
 306     So, to convert u&v to pixelmap coordinates, multiply them\r
 307 both by 256.  Now, use them as indices into the texture map,\r
 308 output the value found there, and voila, you've texture mapped!\r
 309 \r
 310 The loop should look something like this:\r
 311 \r
 312 [ loop #1 ]\r
 313     for every j which is a row in the polygon\r
 314       screen = 0xA0000000 + 320*j\r
 315       for i = start_x to end_x for this row\r
 316         a = Oa + (Ha * i) + (Va * j)\r
 317         b = Ob + (Hb * i) + (Vb * j)\r
 318         c = Oc + (Hc * i) + (Vc * j)\r
 319         u = 256 * a / c\r
 320         v = 256 * b / c\r
 321         screen[i] = texture_map[v][u]\r
 322       endfor\r
 323     endfor\r
 324 \r
 325     Once you've got that working, congratulations!  You're\r
 326 done dealing with the annoying messy part, which is getting\r
 327 those 9 magic numbers computed right.  The rest of this is\r
 328 just hard grunt work and trickery trying to make the code\r
 329 faster.\r
 330 \r
 331     From here on in, I'm only going to look at the inner\r
 332 loop, that is, a single run (row or column), and let the\r
 333 rest of the runs be understood.\r
 334 \r
 335 TEX\r
 336 TE   Prepare to meet thy DOOM\r
 337 T\r
 338 \r
 339     This subsection is concerned with vastly speeding\r
 340 up texture mapping by restricting the mapper to walls,\r
 341 floors and ceilings, or what is commonly called\r
 342 DOOM-style texture mapping, although it of course predates\r
 343 DOOM (e.g. Ultima Underworld [*], Legends of Valour).\r
 344 \r
 345 [*  Yes, Underworld allowed you to tilt the view,\r
 346 but it distorted badly.  Underworld essentially used\r
 347 DOOM-style tmapping, and tried to just use that on\r
 348 arbitrarily-angled polygons.  I can't even begin to\r
 349 guess what Underworld II was doing for the same thing.]\r
 350 \r
 351     To begin with, let's take loop #1 and get as much\r
 352 stuff out of the inner loop as we can, so we can see\r
 353 what's going on.  Note that I'm not going to do low-level\r
 354 optimizations, just mathematical optimizations; I assume\r
 355 you understand that array walks can be turned into\r
 356 pointer walks, etc.\r
 357 \r
 358 [ loop #2 ]\r
 359       a = Oa + Va*j\r
 360       b = Ob + Vb*j\r
 361       c = Oc + Vc*j\r
 362 \r
 363       a += start_x * Ha\r
 364       b += start_x * Hb\r
 365       c += start_x * Hc\r
 366 \r
 367       for i = start_x to end_x\r
 368         u = 256 * a / c\r
 369         v = 256 * b / c\r
 370         screen[i] = texture_map[v][u]\r
 371         a += Ha\r
 372         b += Hb\r
 373         c += Hc\r
 374       endfor\r
 375 \r
 376     With fixed point math, the multiplies by 256\r
 377 are really just shifts.  Furthermore, they can be\r
 378 "premultiplied" into a and b (and Ha and Hb) outside\r
 379 the loop.\r
 380 \r
 381     Ok, so what do we have left?  Three integer adds,\r
 382 a texture map lookup, and two extremely expensive fixed-point\r
 383 divides.  How can we get rid of the divides?\r
 384 \r
 385     This is the big question in texture mapping, and most\r
 386 answers to it are _approximate_.  They give results that\r
 387 are not quite the same as the above loop, but are difficult\r
 388 for the eye to tell the difference.\r
 389 \r
 390     However, before we delve into these, there's a very\r
 391 special case in which we can get rid of the divides.\r
 392 \r
 393     We can move the divide by c out of the loop without\r
 394 changing the results IFF c is constant for the duration\r
 395 of the loop.  This is true if Hc is 0.  It turns out that\r
 396 Hc is 0 if all the points on the run of pixels are the same\r
 397 depth from the viewer, that is, they lie on a line of\r
 398 so-called "constant Z" (I would call it "constant Y" in\r
 399 my coordinate system).\r
 400 \r
 401     The requirement that a horizontal line of pixels be the\r
 402 same depth turns out to be met by ceilings and floors.\r
 403 For ceilings and floors, Hc is 0, and so the loop can be\r
 404 adjusted to:\r
 405 \r
 406 [ loop #3 ]\r
 407       __setup from loop #2__\r
 408       u = 256 * a / c\r
 409       v = 256 * b / c\r
 410       du = 256 * Ha / c\r
 411       dv = 256 * Hb / c\r
 412       for i = start_x to end_x\r
 413         screen[i] = texture_map[v][u]\r
 414         u += du\r
 415         v += dv\r
 416      endfor\r
 417 \r
 418     Now _that_ can be a very fast loop, although adjusting\r
 419 the u&v values so they can be used as indices has been\r
 420 glossed over.  I'll give some sample assembly in the next\r
 421 section and make it all explicit.\r
 422 \r
 423     First, though, let's look at walls.  Walls are almost\r
 424 identical to floors and ceilings.  However, with walls,\r
 425 Vc is 0, instead of Hc.  This means that to write a loop\r
 426 in which c is constant, we have to walk down columns instead\r
 427 of across rows.  This affects scan-conversion, of course.\r
 428 \r
 429     The other thing about walls is that with floors, since\r
 430 you can rotate about the vertical axis (Z axis for me, Y axis\r
 431 for most of you), the horizontal runs on the floors cut\r
 432 across the texture at arbitrary angles.  Since you're\r
 433 bound to not tilt your head up or down, and since the\r
 434 polygons themselves aren't tilted, you generally find\r
 435 that for walls, Va is 0 as well.  In other words, as you\r
 436 walk down a column of a wall texture, both a & c are constant,\r
 437 so u is constant; you generally only change one coordinate,\r
 438 v, in the texture map.  This means the inner loop only needs\r
 439 to update one variable, and can be made to run _very_ fast.\r
 440 \r
 441     The only thing missing from this discussion for creating\r
 442 a DOOM clone is how to do transparent walls, how to do\r
 443 lighting things, and how to make it fast enough.  These will\r
 444 be discussed in section 4, although the some of the speed\r
 445 issue is addressed by the inner loops in the next subsection,\r
 446 and the rest of the speed issue is discussed in general terms\r
 447 in section 3.\r
 448 \r
 449 TEX\r
 450 TE   ...wrapped around your finger...\r
 451 T\r
 452 \r
 453     So far, we've only looked at texture mapping a single\r
 454 polygon.  Of course, it's obvious how to texture map\r
 455 a lot of polygons--just lather, rinse, repeat.  But it\r
 456 may seem sort of wasteful to go through all the 3D math\r
 457 and all over and over again if we just want to have one\r
 458 long wall with the same texture repeating over and over\r
 459 again--like linoleum tiles or wallpaper.\r
 460 \r
 461     Well, we don't have to.  Let's think about this\r
 462 idea of a "texture map space" some more.  We defined\r
 463 it as being a coordinate system "superimposed" on\r
 464 the polygon that told us where the texture goes.\r
 465 However, when we implemented it, we simply used the\r
 466 polygon itself (in essence) as the coordinate space.\r
 467 \r
 468     To see this, make a polygon which is a rectangle,\r
 469 perhaps four times as long as it is tall.  When it\r
 470 is drawn, you will see the texture is distorted,\r
 471 stretched out to four times its length in one dimension.\r
 472 Suppose we wanted it to repeat four times instead?\r
 473 \r
 474     The first step is to look at what the definition\r
 475 of the texture map space means.  The texture map space\r
 476 shows how the physical pixelmap itself goes onto the\r
 477 polygon.  To get a repeating texture map, our first\r
 478 step is to just get one of the copies right.  If\r
 479 we set up our P,M, & N so that the M only goes one\r
 480 quarter of the way along the long edge of the\r
 481 rectangle, we'll map the texture onto just that\r
 482 quarter of the rectangle.\r
 483 \r
 484      Here's a picture to explain it:\r
 485 \r
 486               Polygon A-B-C-D                    Texture map\r
 487 \r
 488    A          E                                  u=0     u=1\r
 489     o--------o-___________          B        v=0  11112222\r
 490     |111112222            ---------o              11112222\r
 491     |111112222                     |              33334444\r
 492     |111112222                     |              33334444\r
 493     |333334444                     |         v=1\r
 494     |333334444                     |\r
 495     |333334444 ___________---------o\r
 496     o--------o-                     C\r
 497    D          F\r
 498 \r
 499     So, we used to map (u,v)=(0,0) to A, (u,v)=(1,0) to B, and\r
 500 (u,v) = (0,1) to D.  This stretched the texture map out to\r
 501 fill the entire polygon map.\r
 502 \r
 503     Now, instead, we map (u,v)=(1,0) to E.  In other words,\r
 504 let P = A, M = E-A, and N = D-A.  In this new coordinate\r
 505 space, we will map the texture onto the first quarter of\r
 506 the polygon.\r
 507 \r
 508     What about the rest of the polygon?  Well, it simply\r
 509 turns out that for the first quarter 0 <= u <= 1.  For\r
 510 the rest, 1 <= u <= 4.\r
 511 \r
 512     To make the texture wrap around, all we have to do is\r
 513 ignore the integer part of u, and look at the fractional part.\r
 514 Thus, as u goes from 1 to 2, we lookup in the texture map using\r
 515 the fractional part of u, or (u-1).\r
 516 \r
 517     This is all very simple, and the upshot is that, once\r
 518 you define P, M, and N correctly, you simply have to mask\r
 519 your fixed-point u&v values; this is why we generally use\r
 520 texture maps whose sides are powers of two, so that we can\r
 521 mask to stay within the texture map.  (Also because they\r
 522 fit conveniently into segments this way, and also so that\r
 523 the multiply to convert u&v values from 0..1 to indices\r
 524 is just a shift.)\r
 525 \r
 526     I'm assuming that's a sufficient explanation of the\r
 527 idea for you to get it all setup.  So here's the assembly\r
 528 inner loops I promised.  I'm not going to bother giving\r
 529 the ultra-fast vertical wall-drawing case, just the\r
 530 horizontal floor/ceiling-drawing case.\r
 531 \r
 532     Note that a mask of 255 (i.e. for a 256x256 texture)\r
 533 can be gotten for free; however, no program that I'm\r
 534 aware of uses texture maps that large, since they\r
 535 simply require too much storage, and they can cache very\r
 536 poorly in the internal cache.\r
 537 \r
 538     First, here's your basic floor/ceiling texture mapper,\r
 539 in C, with wraparound, and explicitly using fixed point\r
 540 math--but no setup.\r
 541 \r
 542 [ loop #4 ]\r
 543     mask = 127, 63, 31, 15, whatever.\r
 544     for (i=0; i < len; ++i) {\r
 545       temp = table[(v >> 16) & mask][(u >> 16) & mask];\r
 546       u += du;\r
 547       v += dv;\r
 548     }\r
 549 \r
 550     Now, here's an assembly one.  This one avoids the\r
 551 shifts and does both masks at the same time, and uses\r
 552 16 bits of "fractional" precision and however many bits\r
 553 are needed for the coordinates.  Note that I assume\r
 554 that the texture, even if it is 64x64, still has each\r
 555 row starting 256 bytes apart.  This just requires some\r
 556 creative storage approaches, and is crucial for a fast\r
 557 inner loop, since no shifting&masking is required to\r
 558 assemble the index.\r
 559 \r
 560        mov  al,mask\r
 561        mov  ah,mask\r
 562        mov  mask2,ax      ; setup mask to do both at the same time\r
 563 \r
 564 loop5  and  bx,mask2      ; mask both coordinates\r
 565        mov  al,[bx]       ; fetch from the texture map\r
 566        stosb\r
 567        add  dx,ha_low     ; update fraction part of u\r
 568        adc  bl,ha_hi      ; update index part of u\r
 569        add  si,hb_low     ;   these are constant for the loop\r
 570        adc  bh,hb_hi      ;   they should be on the stack\r
 571        loop loop5         ;   so that ds is free for the texture map\r
 572 \r
 573     This code is decent, but nowhere near as fast as it can be.\r
 574 The main trick to improving performance is to use 32 bit adds\r
 575 instead of two adds.  The problem with this is that extra operations\r
 576 are required to setup the indexing into the texture map.  Through\r
 577 the use of the ADC trick, these can be minimized.  In the following\r
 578 code, bl and bh are unchanged.  However, the top half of EBX now\r
 579 contains what used to be in si, and the other values have been\r
 580 moved into registers.  ESI contains hb_low in the top half, and\r
 581 ha_hi in the low 8 bits.  This means that ADC EBX,ESI achieves\r
 582 the result of two of the additions above.  Also, we start using\r
 583 BP, so we move our variables into the data segment and the texture\r
 584 map into FS.\r
 585 \r
 586 loop6  and  bx,mask2\r
 587        mov  al,fs:[bx]\r
 588        stosb\r
 589        add  dx,bp          ; update fractional part of u\r
 590        adc  ebx,esi        ; update u (BL) and frac. part of v (EBX)\r
 591        adc  bh,ch          ; update index part of v\r
 592        dec  cl\r
 593        jnz  loop6\r
 594 \r
 595     This is a bit faster, although it has one bug.  It's\r
 596 possible for the addition into BL to overflow into BH.  It might\r
 597 not seem to be, since BL is masked every iteration back down to\r
 598 stay in 0..127, 0..63, or whatever.  However, if the step is\r
 599 negative, then BL will be decremented each iteration, and may\r
 600 "underflow" and subtract one from BH.  To handle this, you need\r
 601 a seperate version of the loop for those cases.\r
 602 \r
 603     If you're not doing wraparound textures, you can speed the\r
 604 loop up a bit more by removing the and.  You can run entirely\r
 605 from registers except for the texture map lookup.  Additionally,\r
 606 unrolling the loop once cuts down on loop overhead, and is crucial\r
 607 if you're writing straight to the VGA, since it doubles your\r
 608 throughput to a 16-bit VGA card.\r
 609 \r
 610     Here's a very fast no-wraparound texture mapper.  It uses\r
 611 the ADC trick twice.  Note that the carry flag is maintained\r
 612 around the loop every iteration; unfortunately the 'and' required\r
 613 for wraparound textures clears the carry flag (uselessly).  EBX\r
 614 and EDX contain u & v in their bottom 8 bits, and contain the\r
 615 fractional parts of v & u in their top bits (note they keep the\r
 616 _other_ coordinate's fractional parts).  You have to have prepped\r
 617 the carry flag first; if you can't figure this technique out, don't\r
 618 sweat it, or look to see if someone else has a more clear discussion\r
 619 of how to do fast fixed-point walks using 32-bit registers.\r
 620 \r
 621     This loop is longer because it does two pixels at a time.\r
 622 \r
 623 loop7   mov  al,[bx]       ; get first sample\r
 624         adc  edx,esi       ; update v-high and u-low\r
 625         adc  ebx,ebp       ; update u-high and v-low\r
 626         mov  bh,dl         ; move v-high into tmap lookup register\r
 627         mov  ah,[bx]       ; get second sample\r
 628         adc  edx,esi\r
 629         adc  ebx,ebp\r
 630         mov  bh,dl\r
 631         mov  es:[di],ax    ; output both pixels\r
 632         inc  di            ; add 2 to di without disturbing carry\r
 633         inc  di\r
 634         dec  cx\r
 635         jnz  loop7\r
 636 \r
 637     I went ahead and 486-optimized the stosw/loop at the end to make\r
 638 cycle-counting easier.  All of these instructions are single cycle\r
 639 instructions, except the branch, and the segment-override.  So you're\r
 640 looking at roughly 15 cycles for every two pixels.  Your caching\r
 641 behavior on the reads and writes will determine the actual speed.\r
 642 It can be unrolled another time to further reduce the loop overhead;\r
 643 the core operations are 9 instructions (10 cycles) for every two\r
 644 pixels.  Note the "inc di/inc di", which protects the carry flag.\r
 645 If you unroll it again, four "inc di"s will be required.  Unroll it\r
 646 another time, and you're better off saving the carry flag, adding,\r
 647 and restoring, for example "adc ax,ax/add di,8/shr ax,1", rather\r
 648 than 8 "inc di"s.\r
 649 \r
 650 TEX\r
 651 TE   Lost My Shape (trying to act casual)\r
 652 T\r
 653 \r
 654     Non-rectangular polygons are trivial under this system.\r
 655 Some approaches require you to specify the (u,v) coordinates\r
 656 for each of the vertices of the polygon.  With this technique,\r
 657 you instead specify the 3D coordinates for three of the\r
 658 "vertices" of the texture map.  So the easiest way of handling\r
 659 a texture of a complex polygon is simply to use a square\r
 660 texture which is larger than the polygon.  For example:\r
 661 \r
 662    P1                       P2\r
 663      x    B  _______  C    x\r
 664            /         \\r
 665          /             \\r
 666      A /                 \\r
 667        \                 / D\r
 668          \             /\r
 669            \ _______ /\r
 670      x    F           E\r
 671    P3\r
 672 \r
 673     Now, we simply define the texture map such that P is P1,\r
 674 M is P2-P1, and N is P3-P1.  Then, if our texture looks like\r
 675 this:\r
 676 \r
 677      u=0        u=1\r
 678       ------------\r
 679  v=0 |..XXoooooo..\r
 680      |.XXXXoooooo.\r
 681      |XXXXXXoooooo\r
 682      |.XXXXXXoooo.\r
 683  v=1 |..XXXXXXoo..\r
 684 \r
 685     Then the regions marked by '.' in the texture map will\r
 686 simply never be displayed anywhere on the polygon.\r
 687 \r
 688     Wraparound textures can still be used as per normal,\r
 689 and concave polygons require no special handling either.\r
 690 \r
 691     Also, you can get special effects by having M and N not\r
 692 be perpendicular to each other.\r
 693 \r
 694 TEXTURE\r
 695 TEXUE\r
 696 TXR   3.  The Hazards\r
 697 X\r
 698 \r
 699     This sections discusses some of the pitfalls and\r
 700 things-aren't-quite-as-simple-as-they-sounded issues\r
 701 that come up while texture mapping.  All of the\r
 702 information is, to some extent, important, whether\r
 703 you've encountered this problem or not.\r
 704 \r
 705 TEX\r
 706 TE   Cl-cl-cl-close to the Edge\r
 707 T\r
 708 \r
 709     At some time when you're texture mapping, you'll\r
 710 discover (perhaps from the screen, perhaps from a\r
 711 debugger) that your U & V values aren't within the\r
 712 0..1 range; they'll be just outside it.\r
 713 \r
 714     This is one of these "argh" problems.  It is\r
 715 possible through very very careful definition of\r
 716 scan-conversion operations to avoid it, but you're\r
 717 likely to encounter it.\r
 718 \r
 719     If you use wraparound textures, you may not ever\r
 720 notice it, however, since when it happens, the\r
 721 texture will simply wraparound and display an\r
 722 appropriate pixel.\r
 723 \r
 724     If not, you may get a black pixel, or just\r
 725 garbage.  It'll only happen at the edges of your\r
 726 polygon.\r
 727 \r
 728     The reason this happens is because your scan-conversion\r
 729 algorithm may generate pixels "in the polygon" whose\r
 730 pixel-centers (or corners, depending on how you've\r
 731 defined it) are just outside the texture--that is,\r
 732 they're outside the polygon itself.\r
 733 \r
 734     The right solution to this is to fix your scan-converter.\r
 735 If your texture mapper computes u&v coordinates based on\r
 736 the top-left corner of the pixel (as the one I've defined\r
 737 so far has), make sure your scan-converter only generates\r
 738 pixels whose top-left corner is really within the polygon.\r
 739 If you do this, you may need to make a minor change to my\r
 740 definition of M & N, but I'm not going to discuss this\r
 741 further, since you probably won't do this.\r
 742 \r
 743     A second option is to define P, M, and N such that the\r
 744 texture map space is slightly bigger than the polygon; that\r
 745 is, so that if you go just off the edge of the polygon,\r
 746 you'll still be within the texture map.\r
 747 \r
 748     This is a pain since you end up having to transform extra\r
 749 3D points to do it.\r
 750 \r
 751     The third, and probably most common solution, is to always\r
 752 use wraparound textures, which hide the problem, but prevent\r
 753 you from using textures that have one edge that highly contrasts\r
 754 with another.\r
 755 \r
 756     The fourth, and probably second most common solution,\r
 757 and the one that turns out to be a real pain, is to "clamp"\r
 758 the u&v values to be within the texture all the time.\r
 759 \r
 760     Naively, you just put this in your inner loop:\r
 761 \r
 762       if (u < 0) u = 0; else if (u > 1) u = 1;\r
 763       if (v < 0) v = 0; else if (v > 1) v = 1;\r
 764 \r
 765     Of course, you don't really do this, since it'd slow\r
 766 you down far too much.  You can do this outside the loop,\r
 767 clamping your starting location for each run.  However,\r
 768 you can't, under this system, clamp the ending value\r
 769 easily.\r
 770 \r
 771     Remember that in the loop we update u and v with\r
 772 (essentially) Ha/c and Hb/c.  These are constant\r
 773 across the entire run, but not constant across the\r
 774 entire polygon, because c has different values for\r
 775 different runs.\r
 776 \r
 777     We can compute du and dv in a different way to\r
 778 allow for clamping.  What we do is we explicitly compute\r
 779 (a,b,c) at (start_x, j) as we did before, but we also\r
 780 compute (a,b,c) at (end_x, j).  From these we compute\r
 781 (u,v) at start_x & at end_x.  Next we clamp both sets\r
 782 of u & v.  Then we compute du and dv with\r
 783 \r
 784     du = (u2 - u1) / (end_x - start_x - 1)\r
 785     dv = (v2 - v1) / (end_x - start_x - 1)\r
 786 \r
 787     This is slightly more expensive than the old way,\r
 788 because we have to compute u2 and v2, which requires\r
 789 extra divides.  However, for methods that explicitly\r
 790 calculate u&v sets and then compute deltas (and we'll\r
 791 see some in section 4), this is the way to go.\r
 792 \r
 793     One final thing you can do is interpolate the (a,b,c)\r
 794 triple from the vertices as you scan convert.  This will\r
 795 guarantee that all (a,b,c) triples computed will lie be\r
 796 within the polygon, and no clamping will be necessary\r
 797 (but deltas must still be computed as above).  However,\r
 798 you have to make sure the (a,b,c) values you compute at\r
 799 the vertices are clamped themselves, which is not too hard\r
 800 by a bit more complicated than clamping (u,v) values.\r
 801 \r
 802 TEX\r
 803 TE   Out of This Domain -- Zero's Paradox\r
 804 T\r
 805 \r
 806     Divides by zero are ugly.  We programmers don't\r
 807 like them.  If this were an ideal world (a quick\r
 808 nod to mathematicians and some physicists), the\r
 809 texture mapping equations would be divide-by-zero-free.\r
 810 \r
 811     Unfortunately, it's a repercussion of the exact\r
 812 same problem as above that you can bump into them.\r
 813 \r
 814     Remember above, I noted that it's possible to\r
 815 get (u,v) pairs with a value just outside of the\r
 816 0..1 range, because a pixel we're texture mapping\r
 817 isn't even in the polygon?\r
 818 \r
 819     Well, even worse, it's possible for this pixel,\r
 820 which isn't in the polygon, to be along the horizon\r
 821 line (vanishing point) for the polygon.  If this\r
 822 happens, your Y value (sorry, Z for most of you)\r
 823 would be infinite if you tried to compute the 3D\r
 824 coordinates from the screen coordinates; and in\r
 825 the (u,v) computation, you end up with a 0 value\r
 826 for c.  Since u = a/c, blammo, divide by 0.\r
 827 \r
 828     Well, the solution is simple.  Test if c is 0,\r
 829 and if it is, don't divide.  But what _should_ you do?\r
 830 \r
 831     Well, let's look at an "even worse" case.  Suppose\r
 832 the pixel is so far off the polygon it's across the\r
 833 horizon line.  In this case, we'll end up with c having\r
 834 the "wrong" sign, and while our divide won't fault on\r
 835 us, our u&v values will be bogus.\r
 836 \r
 837     What do we do then?\r
 838 \r
 839     We can't clamp our a&b&c values very easily.\r
 840 Fortunately, it turns out we don't have to.  If\r
 841 this happens, it means the edge of the polygon\r
 842 must be very close to the horizon, or the viewer\r
 843 must be very, very flat to the polygon (if you know\r
 844 what I mean).  If so, the viewer can't really tell\r
 845 what should be "right" for the polygon, so if we\r
 846 screw up the u&v values, it really doesn't matter.\r
 847 \r
 848     So the answer is, don't worry if c gets the\r
 849 wrong sign, and if c comes out to be 0, use any\r
 850 value for u&v that you like--(0,0) makes an obvious\r
 851 choice.\r
 852 \r
 853     I've never had a serious problem with this, but\r
 854 it is possible that this could actually give you some\r
 855 pretty ugly results, if, say, two corners of a polygon\r
 856 both "blew up", and you treated them both as being\r
 857 (0,0).  It can also cause problems with wraparound\r
 858 polygons not repeating the right amount.\r
 859 \r
 860 TEX\r
 861 TE   Do the Dog\r
 862 T\r
 863 \r
 864     Most polygon 3D graphics engines probably use\r
 865 the painter's algorithm for hidden surface removal.\r
 866 You somehow figure out what order to paint the polygons\r
 867 in (depth sort, BSP trees, whatever), and then paint\r
 868 them back-to-front.  The nearer polygons obscure the\r
 869 farther ones, and voila!, you're done.\r
 870 \r
 871     This works great, especially in a space combat\r
 872 simulator, where it's rare that you paint lots of pixels.\r
 873 \r
 874     You can texture map this way, too.  For example,\r
 875 Wing Commander II doesn't texture map, but it does\r
 876 real time rotation, which involves essentially the\r
 877 same inner loop.  Wing Commander II is fast--until\r
 878 a lot of ships are on the screen close to you, at which\r
 879 point it bogs down a bit.\r
 880 \r
 881     If you care about not slowing down too much in the above\r
 882 case, or you want to do an "indoor" renderer with lots of\r
 883 hidden surfaces, you'll find that with texture mapping,\r
 884 you can ill-afford to use the painter's algorithm.\r
 885 \r
 886     You pay a noticeable cost for every pixel you texture\r
 887 map.  If you end up hiding 80% of your surfaces (i.e. there\r
 888 are five "layers" everywhere on the screen), you end up\r
 889 "wasting" 80% of the time you spend on texture mapping.\r
 890 \r
 891     To prevent this, you have to use more complex methods\r
 892 of hidden surface removal.  These will probably slow you down\r
 893 somewhat, but you should make up for it with the gain in texture\r
 894 mapping.\r
 895 \r
 896     The essential idea is to only texture map each screen pixel once.\r
 897 To do this, you do some sort of "front-to-back" painting, where\r
 898 you draw the nearest surface first.  Any pixel touched by this\r
 899 surface should never be considered for drawing again.\r
 900 \r
 901     There are many ways to do this.  You can process a single\r
 902 scanline or column at a time and use ray-casting or just\r
 903 "scanline processing", then resolve the overlap between the\r
 904 runs with whatever method is appropriate.  You can stay\r
 905 polygonal and maintain "2D clipping" information (a data\r
 906 structure which tracks which pixels have been drawn so far).\r
 907 \r
 908     Beyond getting a fast inner loop for texture mapping,\r
 909 getting a fast hidden-surface-removal technique (and a fast\r
 910 depth-sorting technique if appropriate) is probably the\r
 911 next most crucial thing for your frame rate.\r
 912 \r
 913     But the details are beyond the scope of this article.\r
 914 \r
 915     Note that if you attempt to use a Z-buffer, you will\r
 916 still end up paying all of the costs of texture mapping for\r
 917 every forward-facing polygon (or at least 50% of them if\r
 918 you get really tricky; if you get really, really tricky,\r
 919 the sky's the limit.)  I strongly doubt that any PC game\r
 920 now out, or that will come out in the next year, will\r
 921 render full-screen texture mapping through a Z-buffer.\r
 922 (Superimposing a rendered image on a Z-buffered background\r
 923 is a different issue and is no doubt done all the time.)\r
 924 \r
 925 \r
 926 TEXTURE\r
 927 TEXUE\r
 928 TXR   4.  The Complexities\r
 929 X\r
 930 \r
 931     In this section we will discuss lots of miscellaneous\r
 932 topics.  We'll look at some more optimizations, such as\r
 933 considerations for dealing with slow VGA cards, and how to\r
 934 texture map arbitrarily-angled polygons without doing two\r
 935 divides per pixel.  We'll talk about a technique that lets\r
 936 you use textures with high-frequency components, and one way\r
 937 to integrate lighting into texture-mapping.\r
 938 \r
 939 TEX\r
 940 TE   Arbitrarily-Angled Polygons\r
 941 T\r
 942 \r
 943     First suggestion: Don't.  Set up your world to\r
 944 have all (or mostly) walls and floors.  Supporting\r
 945 arbitrarily-angled polygons is going to slow you\r
 946 down, no matter what.\r
 947 \r
 948     The original texture mapping loop, which supported\r
 949 arbitrarily-angled polygons, required two divides per\r
 950 pixel.  We don't have to go that slow, but we'll never\r
 951 go as fast as DOOM-style rendering can go.  (However,\r
 952 as you start to use more sophisticated lighting algorithms\r
 953 in your inner loop, the cost of handling arbitrarily-\r
 954 angled polygons may start to become less important.)\r
 955 \r
 956     There is one way to texture map such polygons\r
 957 "perfectly" without two divides per pixel, and a\r
 958 host of ways to do it "imperfectly".  I'll discuss\r
 959 several of these ways in varying amounts of detail.\r
 960 Your best bet is to implement them all and see\r
 961 which ones you can get to run the fastest but still\r
 962 look good.  You might find that one is faster for\r
 963 some cases but not for others.  You could actually\r
 964 have an engine which uses all the methods, depending\r
 965 on the polygon it's considering and perhaps a "detail"\r
 966 setting which controls how accurate the approximations\r
 967 are.\r
 968 \r
 969     The "perfect" texture mapping algorithm is\r
 970 described in another article, "Free-direction texture\r
 971 mapping".  I'll summarize the basic idea and the\r
 972 main flaw.  The basic idea is this.  For ceilings\r
 973 and walls, we were able to walk along a line on\r
 974 the screen for which the step in the "c" parameter\r
 975 was 0; this was a line of "constant Z" on the\r
 976 polygon.\r
 977 \r
 978     It turns out that every polygon has lines of\r
 979 "constant Z"--however, they can be at any angle,\r
 980 not necessarily vertical or horizontal.\r
 981 \r
 982     What this means, though, is that if you walk\r
 983 along those lines instead of walking along a horizontal\r
 984 or vertical, you do not need a divide to compute your\r
 985 texture map coordinates, just deltas.\r
 986 \r
 987     The details can be found in the other article.\r
 988 The slope of the line to walk on the screen is something\r
 989 like Hc/Vc.\r
 990 \r
 991     Note, however, that the "DOOM" approach was _just_\r
 992 an optimization for a special case.  The wall & ceiling\r
 993 renderers produce exactly the same results as a\r
 994 perfect texture mapper, for the polygons that they\r
 995 handle (ignoring rounding errors and fixed-point precision\r
 996 effects).  This is not true for the "free-direction"\r
 997 texture mapper.  While there is a line across the\r
 998 screen for which the polygon has constant Z,\r
 999 you cannot walk exactly along that line, since you\r
1000 must step by pixels.  The end result is that while\r
1001 in the texture map space, you move by even steps,\r
1002 in the screen space, you move with ragged jumps.\r
1003 With perfect texture mapping, you always sample\r
1004 from the texture map from the position corresponding\r
1005 to the top-left/center of each pixel.  With the\r
1006 free-direction mapper, you sample from a "random"\r
1007 location within the pixel, depending on how you're\r
1008 stepping across the screen.  This "random" displacement\r
1009 is extremely systematic, and leads to a systematic\r
1010 distortion of the texture.  I find it visually\r
1011 unacceptable with high-contrast textures, compared to\r
1012 perfect texture mapping, but you should try it and decide\r
1013 for yourself.  The technically inclined should note that\r
1014 this is simply the normal "floor" renderer with an extra\r
1015 2D skew, and that while 2D skews are trivial, they are\r
1016 non-exact and suffer from the flaw described above.\r
1017 \r
1018     The only other alternative for arbitrarily-angled\r
1019 polygons is to use some kind of approximation.  We\r
1020 can characterize u and v as functions of i (the horizontal\r
1021 screen position; or use 'j' if you wish to draw columns);\r
1022 for instance, u = a / c, where a = q + i*Ha, c = p + i*Hc.\r
1023 So we can say  u(i) = (q + i*Ha) / (r + i*Hc).\r
1024 \r
1025     Now, instead of computing u(i) exactly for each i,\r
1026 as we've done until now, we can instead compute some\r
1027 function u'(i) which is approximately equal to u(i) and\r
1028 which can be computed much faster.\r
1029 \r
1030     There are two straightforward functions which we\r
1031 can compute very fast.  One is the simple linear\r
1032 function we used for DOOM-style mapping, u'(x) = r + x*s.\r
1033 Since the function we're approximating is curved (a\r
1034 hyperbola), a curved function is another possibility,\r
1035 such as u'(x) = r + x*s + x^2*t.  (SGI's Reality Engine\r
1036 apparently uses a cubic polynomial.)\r
1037 \r
1038     If you try both of these approximations on a very\r
1039 large polygon at a sharp angle, you will find that\r
1040 they're not very good, and still cause visible\r
1041 curvature.  They are, of course, only approximations.\r
1042 The approximations can be improved with a simple\r
1043 speed/quality trade-off through subdivision.  The\r
1044 idea of subdivision is that the approximation is\r
1045 always of high quality for a small enough region,\r
1046 so you can simply subdivide each region until the\r
1047 subregions are small enough to have the desired\r
1048 quality.\r
1049 \r
1050     There are two ways to subdivide.  One simple way\r
1051 is to subdivide the entire polygon into smaller\r
1052 polygons.  This should be done on the fly, not\r
1053 ahead of time, because only polygons that are at\r
1054 "bad" angles need a lot of subdivision.  After\r
1055 dividing a polygon into multiple smaller ones,\r
1056 render each one seperately.  Use the original\r
1057 P, M, and N values for all of the new polygons\r
1058 to make the texture remain where it should be\r
1059 after subdivision.\r
1060 \r
1061     The (probably) better way to subdivide is to\r
1062 subdivide runs instead of polygons, and so I'll\r
1063 discuss this in more detail.  The essential thing\r
1064 is that to do an approximation, you evaluate the\r
1065 original function at two or more locations and\r
1066 then fit your approximate function to the computed\r
1067 values.  One advantage of run subdivision is that\r
1068 you can share points that you evaluated for one\r
1069 subrun with those of the next.\r
1070 \r
1071     First lets turn back to the two approximations\r
1072 under consideration.  The first is what is called\r
1073 "bilinear texture mapping", because the function\r
1074 is linear and we're tracking two ("bi") values.\r
1075 To use this method, we compute the function at\r
1076 both endpoints: u1 = u(start_x), u2 = u(end_x).\r
1077 Then we compute our start and step values.  To\r
1078 keep things simple, I'm going to assume the approximation\r
1079 function u'(x) is defined from 0..end_x-start_x, not\r
1080 from start_x..end_x.\r
1081 \r
1082     So, the linear function u'(x) = r + s*x, where\r
1083 u'(0) = u1 and u'(end_x - start_x) = u2 is met by\r
1084 letting r = u1, s = (u2 - u1) / (end_x - startx).\r
1085 \r
1086     Now, suppose our run goes from x = 10 to x = 70.\r
1087 If we evaluate u(10), u(20), u(30), u(40),... u(70),\r
1088 then we can have six seperate sections of bilinear\r
1089 texture mapping.\r
1090 \r
1091     For a quadratic, there are several ways to compute\r
1092 it.  One way is to compute an additional sample in the\r
1093 middle; u3 = u((start_x + end_x)/2).  Then we can\r
1094 fit u1,u2, and u3 to u'(x) = r + s*x + t*x^2 with:\r
1095 \r
1096    len = (end_x - start_x)\r
1097    k   = u1 + u2 - u3*2\r
1098    r   = u1\r
1099    s   = (u2 - u1 - 2*k)/len\r
1100    t   = 2*k / len^2\r
1101 \r
1102     Note that to use this in code, you cannot simply\r
1103 use a loop like this:\r
1104 \r
1105    r += s;\r
1106    s += t;\r
1107 \r
1108     because the r,s, and t values aren't correct\r
1109 for discrete advancement.  To make them correct,\r
1110 do this during the setup code:\r
1111 \r
1112     R = r\r
1113     S = s + t\r
1114     T = 2*t\r
1115 \r
1116     Then the loop of (...use R..., R += S, S += T) will work correctly.\r
1117 \r
1118     The biquadratic loop will be slower than the linear\r
1119 loop, but will look better with fewer subdivisions.  You\r
1120 can share one of the endpoints from one biquadratic\r
1121 section to the next.  Note, though, that you require twice\r
1122 as many calculations of u&v values for the same number of\r
1123 subdivisions with a biquadratic vs. a bilinear.\r
1124 \r
1125     Another thing to do is to choose how to subdivide the run\r
1126 more carefully.  If you simply divide it in half or into quarters,\r
1127 you'll discover that some of the subruns come out looking better\r
1128 than others.  So there are some things you can do to improve the\r
1129 subdivision system.  Another thing you can do is to try to make\r
1130 most of your subruns have lengths which are powers of two.  This\r
1131 will let you use shifts instead of divides when computing r,s, and\r
1132 t, which cuts down on your overhead, which lets you use more\r
1133 subdivisions and get the same speed.\r
1134 \r
1135     Note something very important.  Subdivision increases the\r
1136 overhead per run; biquadratic and other things increase the\r
1137 cost of the inner loop.  Before you go crazy trying to optimize\r
1138 your arbitrarily-angled polygon renderer, make sure you're\r
1139 rendering some "typical" scenes.  The "right" answer is going\r
1140 to depend on whether you have lots of very shorts runs or\r
1141 fewer, longer runs.  If you optimize based on a simple test\r
1142 case, you may end up suboptimal on the final code.\r
1143 \r
1144     You probably still want to have both a column-based and a\r
1145 row-based renderer, and use whichever one the polygon is\r
1146 "closer to" (e.g. if Hc is closer to 0 than Vc, use the row-based).\r
1147 Note that the free-direction renderer looks its worst (to me)\r
1148 for very small rotations, i.e. when Hc or Vc are very close to 0.\r
1149 Since in these cases not much subdivision is needed, even if you\r
1150 choose to use a free-direction mapper as your primary renderer,\r
1151 you might still want to have "almost wall" and "almost floor"\r
1152 renderers as well.\r
1153 \r
1154     Finally, there is one more approximation method you can use,\r
1155 which is faster than any of the ones discussed so far, but is\r
1156 simply totally and utterly wrong.  This is the approach used\r
1157 by Michael Abrash in his graphics column in Dr. Dobbs.  While\r
1158 it's quite wrong, it works on polygons which are entirely\r
1159 constant Y (sorry, Z), and can be a noticeable speedup.\r
1160 \r
1161     What you do is 2D (instead of 3D) interpolation.  You mark\r
1162 each vertex with its coordinates in the texture map.  Then when\r
1163 you scan convert, you interpolate these values between vertices\r
1164 on the edges of your runs.  Thus, scan conversion will generate\r
1165 runs with (u,v) values for the left and right end.  Now simply\r
1166 compute (du,dv) by subtracting and dividing by the length (no\r
1167 clamping will be necessary), and use your fast bilinear inner\r
1168 loop.  When combined with 3D polygon subdivision, this approach\r
1169 can actually be useful.\r
1170 \r
1171   A cheat:\r
1172 \r
1173     When the player is moving, set your internal quality\r
1174 settings a little lower.  When the player stops, switch\r
1175 back to the normal quality; if the player pauses the game,\r
1176 render one frame in normal quality.\r
1177 \r
1178     If done right, you can get a small boost to your fps\r
1179 without anyone being able to tell that you did it.  You\r
1180 may have to use normal quality if the player is only\r
1181 moving very slowly, as well.\r
1182 \r
1183     Note that while this may sound like an utterly cheap\r
1184 trick just to improve the on-paper fps number, it's actually\r
1185 quite related to the "progressive refinement" approach used\r
1186 by some real VR systems (which, when the viewer isn't moving,\r
1187 reuse information from the previous frame to allow them to\r
1188 draw successive frames with more detail).\r
1189 \r
1190     There are a number of ways of improving this cheat\r
1191 intelligently.  If the player is moving parallel to a polygon,\r
1192 that polygon will tend to be "stably" texture mapped (similar\r
1193 mapping from frame to frame).  If there is any distortion from\r
1194 your approximation, this will be visible to the player.  So\r
1195 this means a rule of thumb is to only cheat (draw with\r
1196 above-average distortion) on polygons that are not facing\r
1197 parallel to the direction of motion of the player.\r
1198 \r
1199 TEX\r
1200 TE   Light My Fire\r
1201 T\r
1202 \r
1203     If you're texture mapping, it's generally a good idea\r
1204 to light your polygons.  If you don't light them, then it\r
1205 may be difficult to see the edge between two walls which\r
1206 have the same texture (for instance, check out the "warehouse"\r
1207 section of registered DOOM, which is sometimes confusing\r
1208 when a near crate looks the same color as a far crate).\r
1209 \r
1210     Lighting is actually pretty straightforward, although\r
1211 you take a speed hit in your inner loop.  I'm not going\r
1212 to worry about actual lighting models and such; see other\r
1213 articles for discussion on how to do light-sourced polygons.\r
1214 \r
1215     Instead I'm going to assume you've computed the lighting\r
1216 already.  We'll start with "flat-run" shading, wherein an\r
1217 entire run has the same intensity of light falling on it.\r
1218 \r
1219     DOOM uses flat-run shading.  A given polygon has a certain\r
1220 amount of light hitting it, which is the same for the entire\r
1221 polygon.  In addition, each run of the polygon is sort-of\r
1222 lit by the player.  Since runs are always at a constant\r
1223 depth, you can use constant lighting across the run and\r
1224 still change the brightness with distance from the player\r
1225 (DOOM uses something that resembles black fog, technically).\r
1226 \r
1227     So the only real issue is _how_ you actually get the\r
1228 lighting to affect the texture.  Several approaches are\r
1229 possible, but the only one that I think anyone actually uses\r
1230 is with a lighting table.\r
1231 \r
1232     The lighting table is a 2D array.  You use the light\r
1233 intensity as one index, and the pixelmap color as the\r
1234 other index.  You lookup in the table, and this gives\r
1235 you your final output color to display.  (With two\r
1236 tables, you can do simple dithering.)  So the only thing\r
1237 you have to do is precompute this table.\r
1238 \r
1239     Basically, your inner loop would look something like this:\r
1240 \r
1241   ...compute light...\r
1242   for (i=start_x; i <= end_x; ++i) {\r
1243     color = texture[v >> 16][u >> 16];\r
1244     output = light_table[light][color];\r
1245     screen[i] = output;\r
1246     u += du;\r
1247     v += dv;\r
1248   }\r
1249 \r
1250     The next thing to consider is to Gouraud shade your\r
1251 texture map.  To do this, you need to compute the light\r
1252 intensity at the left and right edge of the run; look\r
1253 elsewhere for more details on Gouraud shading.\r
1254 \r
1255     Once you've got that, you just do something like this:\r
1256 \r
1257   z = light1 << 16;\r
1258   dz = ((light2 - light1) << 16) / (end_x - start_x);\r
1259   for (i=start_x; i <= end_x; ++i) {\r
1260     color = texture[v >> 16][u >> 16];\r
1261     output = light_table[z >> 16][color];\r
1262     screen[i] = color;\r
1263     u += du;\r
1264     v += dv;\r
1265     z += dz;\r
1266   }\r
1267 \r
1268     Note that you shouldn't really do this as I've written the\r
1269 code.  light1 and light2 should be calculated with 16 bits of extra\r
1270 precision in the first place, rather than having to be shifted\r
1271 left when computing z.  I just did it that way so the code\r
1272 would be self-contained.\r
1273 \r
1274     I'm going to attempt to give a reasonably fast assembly\r
1275 version of this.  However, there's a big problem with doing\r
1276 it fast.  The 80x86 only has one register that you can\r
1277 address the individual bytes in, and also use for indexing--BX.\r
1278 This means that it's a real pain to make our inner loop alternate\r
1279 texture map lookup and lighting fetch--whereas it's almost\r
1280 trivial on a 680x0.  I avoid this somewhat by processing two\r
1281 pixels at a time; first doing two texture map lookups, then\r
1282 doing two lighting lookups.\r
1283 \r
1284      Here's a flat-shading inner loop.  I'm doing this code off the\r
1285 top of my head, so it may have bugs, but it's trying to show\r
1286 at least one way you might try to do this.  Since I use BP,\r
1287 I put variables in the FS segment, which means DS points\r
1288 to the texture, GS to the lighting table.\r
1289 \r
1290         mov  ch,fs:light\r
1291         adc  ax,ax\r
1292 loop8   shr  ax,1          ; restore carry\r
1293         mov  cl,[bx]       ; get first sample, setting up cx for color lookup\r
1294         adc  edx,esi       ; update v-high and u-low\r
1295         adc  ebx,ebp       ; update u-high and v-low\r
1296         mov  bh,dl         ; move v-high into tmap lookup register\r
1297         mov  ah,[bx]       ; get second sample, save it in ah\r
1298         adc  edx,esi\r
1299         adc  ebx,ebp\r
1300         mov  dh,bl         ; save value of bl\r
1301         mov  bx,cx         ; use bx to address color map\r
1302         mov  al,gs:[bx]    ; lookup color for pixel 1\r
1303         mov  bl,ah         ; switch to pixel 2\r
1304         mov  ah,gs:[bx]    ; lookup color for pixel 2\r
1305         mov  es:[di],ax    ; output both pixels\r
1306         mov  bl,dh         ; restore bl from dh\r
1307         mov  bh,dl\r
1308         adc  ax,ax         ; save carry so we can do CMP\r
1309         add  di,2\r
1310         cmp  di,fs:last_di ; rather than having to decrement cx\r
1311         jne  loop8\r
1312 \r
1313     For a Gouraud shading inner loop, we can now have three\r
1314 different numbers u, v, and z, which we're all adding at every\r
1315 step.  To do this, we use THREE adc, and we have to shuffle\r
1316 around which high-bits correspond to which low-bits in a\r
1317 complex way.  I'll leave you to figure this out for yourself,\r
1318 but here's an attempt at the inner loop.\r
1319 \r
1320 loop9   shr  ax,1          ; restore carry\r
1321         mov  al,fs:[bx]    ; get first sample\r
1322         mov  ah,cl         ; save away current z-high into AH\r
1323                            ; this makes AX a value we want to lookup\r
1324         adc  edx,esi       ; update v-high and u-low\r
1325         adc  ebx,ebp       ; update u-high and z-low\r
1326         adc  ecx,z_v_inc   ; update z-high and v-low\r
1327         mov  bh,dl         ; move v-high into tmap lookup register\r
1328         mov  ch,fs:[bx]    ; get second sample, save it in ch\r
1329         mov  dh,bl         ; save value of bl\r
1330         mov  bx,ax\r
1331         mov  al,gs:[bx]    ; lookup first color value\r
1332         mov  bl,ch\r
1333         mov  bh,cl\r
1334         mov  ah,gs:[bx]    ; lookup second color value\r
1335         mov  es:[di],ax    ; output both pixels\r
1336         mov  bl,dh         ; restore bl from dh\r
1337         adc  edx,esi\r
1338         adc  ebx,ebp\r
1339         adc  ecx,z_v_inc\r
1340         mov  bh,dl\r
1341         adc  ax,ax         ; save carry\r
1342         add  di,2\r
1343         cmp  di,last_di    ; rather than having to decrement cx\r
1344         jne  loop9\r
1345 \r
1346     Notice that both of these loops are significantly slower\r
1347 than the original loop.  I'm not personally aware of any\r
1348 generally faster way to do this sort of thing (but the code\r
1349 can be tweaked to be faster).  The one exception is\r
1350 that for flat-run shading, you could precompute the entire\r
1351 texture with the right lighting.  This would require a lot\r
1352 of storage, of course, but if you view it as a cache, it\r
1353 would let you get some reuse of information from frame to\r
1354 frame, since polygons tend to be lit the same from frame to\r
1355 frame.\r
1356 \r
1357     Finally, here's a brief discussion of transparency.\r
1358 There are two ways to get transparency effects.  The first\r
1359 one is slower, but more flexible.  You use _another_ lookup\r
1360 table.  You have to paint the texture that is transparent\r
1361 after you've drawn things behind it.  Then, in the inner\r
1362 loop, you fetch the texture value (and light it) to draw.\r
1363 Then you fetch the pixel that's currently in that location.\r
1364 Lookup in a "transparency" table with those two values as\r
1365 indices, and write out the result.  The idea is that you\r
1366 do this: table[new][old].  If new is a normal, opaque,\r
1367 color, then table[new][old] == new, for every value of old.\r
1368 If new is a special "color" which is supposed to be transparent,\r
1369 then table[new][old] == old, for every value of old.  This causes\r
1370 old to show through.  In addition, you can have translucency\r
1371 effects, where table[new][old] gives a mixture of the colors\r
1372 of old and new.  This will let you do effects like the\r
1373 translucent ghosts in the Ultima Underworlds.\r
1374 \r
1375     However, the above approach is extremely slow, since you\r
1376 have to load the value from the pixel map and do the extra\r
1377 table lookup.  But it works for arbitrary polygons.   DOOM\r
1378 only allows transparency on walls, not on ceilings and floors.\r
1379 Remember we noticed that the special thing about walls is\r
1380 that u is constant as you draw a column from a wall; you\r
1381 are walking down a column in the texture map at the same\r
1382 time you are drawing a column on screen.  What this means\r
1383 is that you can use a data structure which encodes where\r
1384 the transparency in each column of the texture map is, and\r
1385 use that _outside_ the inner loop to handle transparency.\r
1386 For example, your data structure tells you that you have\r
1387 a run of 8 opaque pixels, then 3 transparent pixels, then\r
1388 5 more opaque ones.  You scale 8, 3, and 5 by the rate at\r
1389 which you're walking over the textures, and simply treat\r
1390 this as two seperate opaque runs.\r
1391 \r
1392     The details of this method depend on exactly how you're\r
1393 doing your hidden surface removal, and since it doesn't\r
1394 generalize to floors&ceilings, much less to arbitrarily\r
1395 angled polygons, I don't think going into further detail\r
1396 will be very useful (I've never bothered writing such a\r
1397 thing, but I'm pretty sure that's all there is to it).\r
1398 \r
1399 \r
1400 TEX\r
1401 TE   The Postman Always Rings Twice\r
1402 T\r
1403 \r
1404     If you're going to write to a slow 16-bit VGA card, you\r
1405 should try your darndest to always write 2 pixels at a time.\r
1406 \r
1407     For texture mapping, your best bet is to build your screen\r
1408 in a buffer in RAM, and then copy it to the VGA all at once.\r
1409 You can do this in Mode 13h or in Mode X or Y, as your heart\r
1410 desires.  You should definitely do this if you're painting\r
1411 pixels more than once while drawing.\r
1412 \r
1413     If, however, you wish to get a speedup by not paying\r
1414 for the extra copy, you might like to write directly to the\r
1415 VGA card from your inner loop.\r
1416 \r
1417     You might not think this is very interesting.  If the\r
1418 write to the screen buffer in regular RAM is fast, how much\r
1419 can you gain by doing both steps at once, instead of splitting\r
1420 them in two?\r
1421 \r
1422     The reason it is interesting is because the VGA, while\r
1423 slow to accept multiple writes, will let you continue doing\r
1424 processing after a single write.  What this means is that\r
1425 if you overlap your texture mapping computation with your\r
1426 write to the VGA, you can as much as double your speed on\r
1427 a slow VGA card.  For example, the fastest I can blast my\r
1428 slow VGA card is 45 fps.  I can texture map floor-style directly\r
1429 to it at 30 fps.  If I texture map to a memory buffer,\r
1430 this is still somewhat slow, more than just the difference\r
1431 between the 30 and 45 fps figures.  Thus, my total rate if\r
1432 I write to an offscreen buffer drops as low as 20 fps, depending\r
1433 on exactly what I do in the texture map inner loop.\r
1434 \r
1435     Ok, so, now suppose you've decided it might be a speedup\r
1436 to write directly to the VGA.  There are two problems.  First\r
1437 of all, if you're in mode X or Y, it's very difficult to\r
1438 write two bytes at a time, which is necessary for this\r
1439 approach to be a win.  Second of all, even in mode 13h, it's\r
1440 difficult to write two bytes at a time when you're drawing\r
1441 a column of pixels.\r
1442 \r
1443     I have no answer here.  I expect people to stick to\r
1444 offscreen buffers, or to simply process columns at a time\r
1445 and write (at excruciatingly slow rates on some cards) to\r
1446 the VGA only one byte at a time.\r
1447 \r
1448     One option is to set up a page flipping mode 13h (which\r
1449 is possible on some VGA cards), and to paint two independent\r
1450 but adjacent columns at the same time, so that you can write\r
1451 a word at a time.  I have a very simple demo that does the\r
1452 latter, but it's not for the faint of heart, and I don't\r
1453 think it's a win once you have a lot of small polygons.\r
1454 \r
1455     Another answer is to have a DOOM-style "low-detail"\r
1456 mode which computes one pixel, duplicates it, and always\r
1457 writes both pixels at the same time.\r
1458 \r
1459     A final answer is just to ignore the market of people\r
1460 with slow VGA cards.  I wouldn't be surprised if this\r
1461 approach was commonplace in a year or two.  But if you do\r
1462 so with commercial software, please put a notice of this\r
1463 requirement on the box.\r
1464 \r
1465 TEX\r
1466 TE   Mipmapping (or is it Mip-Mapping?)\r
1467 T\r
1468 \r
1469     Mipmapping is a very straightforward technique that\r
1470 can be used to significantly improve the quality of your\r
1471 textures, so much so that textures that you could not\r
1472 otherwise use because they look ugly become usable.\r
1473 \r
1474     The problem that mipmapping addresses is as follows.\r
1475 When a texture is far in the distance, such that its\r
1476 on-screen size in pixels is significantly smaller than\r
1477 its actual size as a texture, only a small number of\r
1478 pixels will actually be visible.  If the texture contains\r
1479 areas with lots of rapidly varying high contrast data,\r
1480 the texture may look ugly, and, most importantly,\r
1481 moire artifacts will occur.  (To see this in DOOM, try\r
1482 shrinking the screen to the smallest setting and going\r
1483 outside in shareware DOOM.  Many of the buildings will\r
1484 show moire patterns.  In registered DOOM, there is\r
1485 a black-and-blue ceiling pattern which has very bad\r
1486 artifacts if it is brightly lit.  Go to the mission\r
1487 with the gigantic round acid pool near the beginning.\r
1488 Cheat to get light amplification goggles (or maybe\r
1489 invulnerability), and you'll see it.)\r
1490 \r
1491     Mipmapping reduces these artifacts by precomputing\r
1492 some "anti-aliased" textures and using them when the\r
1493 textures are in the distance.\r
1494 \r
1495     The basic idea is to substitute a texture map half as\r
1496 big when the polygon is so small that only every other\r
1497 pixel is being drawn anyway.  This texture map contains\r
1498 one pixel for every 2x2 square in the original, and is\r
1499 the color average of those pixels.\r
1500 \r
1501     For a 64x64 texture map, you'd have the original\r
1502 map, a 32x32 map, a 16x16 map, an 8x8 map, etc.\r
1503 \r
1504     The mipmaps will smear out colors and lose details.\r
1505 You can best test them by forcing them to be displayed\r
1506 while they're still close to you; once they appear to\r
1507 be working, set them up as described above.\r
1508 \r
1509     Mipmapping causes a somewhat ugly effect when you\r
1510 see the textures switch from one mipmap to the next.\r
1511 However, especially for some textures, it is far less\r
1512 ugly than the effect you would get without them.\r
1513 \r
1514     For example, a fine white-and-black checkerboard\r
1515 pattern (perhaps with some overlaid text) would look\r
1516 very ugly without mipmapping, as you would see random\r
1517 collections of white and black pixels (which isn't too\r
1518 bad), and you would see curving moire patterns (which\r
1519 is).  With mipmapping, at a certain distance the whole\r
1520 polygon would turn grey.\r
1521 \r
1522     I do not believe any existing games for the PC\r
1523 use mipmapping.  However, examining the data file\r
1524 for the Amiga demo version of Legends of Valour showed\r
1525 smaller copies of textures, which made it look like\r
1526 mipmapping was being used.\r
1527 \r
1528     Mipmapping requires 33% extra storage for the\r
1529 extra texture maps (25% for the first, 25% of 25%\r
1530 for the second, etc.).\r
1531 \r
1532     This may also be a good idea for 2D bitmaps which\r
1533 are scaled (e.g. critters in Underworld & DOOM, or\r
1534 ships in Wing Commander II--although none of those\r
1535 appeared to use it.)\r
1536 \r
1537     SGI's Reality Engine does mipmapping.  Actually,\r
1538 it does a texturemap lookup on two of the mipmaps,\r
1539 the "closer" one and the "farther" one, and uses\r
1540 a weighted average between them depending on which\r
1541 size is closer to correct.  (The RE also does\r
1542 anti-aliasing, which helps even more.)\r
1543 \r
1544 \r
1545 TEXTURE\r
1546 TEXUE\r
1547 TXR    Where Do We Go From Here?\r
1548 X\r
1549 \r
1550     The above discussion mostly covers what is basically\r
1551 the state of the art of texture mapping on the PC.  Hopefully\r
1552 in the future every game will be at least as fast as the\r
1553 inner loops in this article allow.\r
1554 \r
1555     As long as people want full-screen images, it'll\r
1556 be a while before we have enough computational power\r
1557 to do more than that.  But if we did have more power,\r
1558 what else could we do with it?\r
1559 \r
1560     o  Better lighting\r
1561       o  Colored lighting (requires complex lookup tables)\r
1562       o  Phong shading (interpolation of normals--one sqrt() per pixel!)\r
1563     o  Higher resolution (640x400, or 640x400 and anti-alias to 320x200)\r
1564     o  A lot more polygons\r
1565     o  Bump mapping (can be done today with huge amounts of precomputation)\r
1566     o  Curved surfaces\r