5 Written for the PC-GPE by Sean Barrett.
\r
7 ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿
\r
8 ³ THIS FILE MAY NOT BE DISTRIBUTED ³
\r
9 ³ SEPARATE TO THE ENTIRE PC-GPE COLLECTION. ³
\r
10 ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
\r
13 -=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=-
\r
15 TEXUE almost everything you need to know to texture map with the PC
\r
18 -=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=- -=-=-=-=-=-
\r
20 Copyright 1994 Sean Barrett.
\r
21 Not for reproduction (electronic
\r
22 or hardcopy) except for personal use.
\r
27 1. terminology and equations
\r
29 Perfect texture mapping
\r
32 Non-rectangular polygons
\r
34 going off the texture map
\r
36 it'll never be fast enough
\r
38 handling arbitrarily-angled polygons
\r
40 slow 16-bit VGA cards
\r
44 | Note: I am not providing any references as I simply
\r
45 | derived the math myself and worked out the various
\r
46 | techniques for myself (the 32-bit ADC trick was
\r
47 | pointed out to me in another context by TJC,
\r
48 | author of the Mars demo) over the last two years
\r
49 | (since Wolfenstein 3D and Underworld came out).
\r
57 I assume a RIGHT-handed 3D coordinate system, with X positive
\r
58 to the right, Y positive disappearing into the screen/distance,
\r
59 and Z positive up. To adjust this to the typical left-handed
\r
60 3D space, simply swap all the 3D Ys & Zs.
\r
62 I assume the screen space is positive-X to the right,
\r
63 positive-Y goes down. Adjust the signs as appropriate for
\r
66 I will present code and pseudo-code in C. I also include
\r
67 some relatively tight inner loops written in assembly, but I'm
\r
68 omitting the details of the loop setup. The inner loops, while
\r
69 usually from real, working code, should generally be taken
\r
70 as examples showing how fast it ought to be possible to run
\r
71 a given task, not as necessarily perfect examples. I often
\r
72 use 32-bit instructions (sorry 286 programmers) because they
\r
73 can double the performance. However, I write in real mode,
\r
74 because 16-bit addressing is often convenient for texture maps,
\r
75 and it's straightforward to use segment registers as pointers
\r
76 to texture maps. The translation to protected mode should not
\r
77 prove problematic, but again, these should more be taken as
\r
78 examples rather than simply being used directly. I optimize
\r
79 for the 486, but I skip some obvious optimizations. For example,
\r
80 I write "loop", because it's simpler and more clear.
\r
81 Production code for the 486 should explicitly decrement and
\r
82 then branch. Similarly, I write "stosb", etc. etc.
\r
87 TXR 1. Terminology and Equations
\r
91 You really probably don't want to read this section first,
\r
92 but rather refer back to it whenever you feel the need. So
\r
93 skip up to section 2 and refer back as appropriate. I could've
\r
94 made this an appendix, but it seems too important to put last.
\r
99 texture: A texture is a pixelmap of colors which is mapped
\r
100 onto a polygon using "texture mapping". The size
\r
101 of the polygon has nothing to do with the size (the
\r
102 number of pixels) in the texture map.
\r
104 run: A run is a row or column of pixels. Normal texture
\r
105 mapping routines process one run in an "inner loop".
\r
107 arbitrarily-angled polygon: a polygon which isn't a floor,
\r
108 wall, or ceiling; technically, a polygon which isn't
\r
109 parallel to the X or Z axes (or X or Y axes in a
\r
110 Z-is-depth coordinate system).
\r
114 polygon coordinate space:
\r
115 Since a texture is flat, or two-dimensional, the relation
\r
116 of the texture to the 3D world can be described with a
\r
117 special coordinate space known by one of these names.
\r
118 Because it is only 2D, the space can be characterized with
\r
119 the location of the texture space in 2D, and two 3D vectors
\r
120 which represent the axes of the coordinate space. Sometimes
\r
121 called "uv" space, because the name of the coordinates are
\r
128 Vectors appear in all caps.
\r
129 Components of vectors are P = < Px, Py, Pz >.
\r
131 Certain variables have consistent usage:
\r
133 x,y,z are coordinates in three-space
\r
134 i,j are screen coordinates
\r
135 u,v are coordinates in texture space
\r
136 a,b,c are "magic coordinates" such that
\r
143 Don't let this scare you off! Go read section 2, and
\r
144 come back to this when you're ready.
\r
146 Let P,M, and N be vectors defining the texture space: P is the
\r
147 origin, and M and N are the vectors for the u&v axes.
\r
149 Assume these vectors are in _view space_, where view space is
\r
150 defined as being the space in which the transformation from
\r
157 In other words, you have to adjust P, M, and N to be relative
\r
158 to the view, and if you have multiplications in your perspective
\r
159 computation, you have to multiply the appropriate components of
\r
160 P, M, and N to compute them. Note that since typically in 3D
\r
161 up is positive, and in 2D down is positive, there may be a
\r
162 multiplication by -1 that needs to go into Py, My, Ny. Note that
\r
163 this also assumes that (0,0) in screen space is at the center
\r
164 of the screen. Since it's generally not, simply translate
\r
165 your screen coordinates (i,j) as if they were before applying
\r
166 the texture mapping math (or if you're funky you can modify
\r
167 your viewspace to pre-skew them).
\r
169 For example, if your transforms are:
\r
170 i = Hscale * x / y + Hcenter
\r
171 j = -Vscale * z / y + Vcenter
\r
173 Then you should simply multiply Px, Mx, and Nx by Hscale,
\r
174 and multiply Py, My, and Mz by -Vscale. Then just remember
\r
175 to subtract Hcenter and Vcenter from the i,j values right
\r
176 before plugging them into the texture mapping equations.
\r
178 We begin by computing 9 numbers which are constant
\r
179 for texture mapping the entire polygon (O stands for
\r
180 Origin, H for Horizontal, and V for vertical; why I use
\r
181 these names should become clear eventually):
\r
195 Ok. Then, for a given screen location (i,j), the formula
\r
196 for the texture space (u,v) coordinates corresponding to
\r
199 a = Oa + i*Ha + j*Va
\r
200 b = Ob + i*Hb + j*Vb
\r
201 c = Oc + i*Hc + j*Hc
\r
212 So you've got your polygon 3D engine running, and
\r
213 you'd like to start adding a bit of texture to your
\r
214 flat- or Gouraud-shaded polygons. Well, it will make
\r
215 it look a lot cooler. But let's point out the
\r
216 disadvantages of texture mapping right away:
\r
219 Sometimes hard to see polygon edges
\r
221 Each of these has certain ramifications on the
\r
222 overall approach you want to take with your code,
\r
223 which we'll come back to later, in sections 3 and 4.
\r
225 Practical advice: Don't try to get your riproaringly
\r
226 fast texture mapper running first. Get a very simple,
\r
227 slow, "perfect" texture mapper working first, as described
\r
228 in the first subsection below. This will allow you to make
\r
229 sure you've gotten the equations right. Realize that I
\r
230 can't present the equations appropriate to every scenario,
\r
231 since there are simply too many spaces people can work
\r
232 in. I've chosen to present the math from an extremely
\r
233 simple coordinate space which keeps the texture mapping
\r
234 relatively simple. You'll have to work out the correct
\r
235 transformations to make it work right, and a slow but
\r
236 correct texture mapping routine may help you tweak the
\r
237 code as necessary to achieve this. Use very simple
\r
238 polygons to start your testing; centered and facing the
\r
239 viewer should be your very first one (if done correctly,
\r
240 this will simply scale the texture).
\r
243 TE Perfect Texture Mapping
\r
246 To start with, we'll do slow but exact "perfect" texture
\r
247 mapping of a square tile with a simple texture map mapped onto
\r
248 it. The polygon is defined in three-space using four points,
\r
249 and the texture map is 256x256 pixels. Note that this is all
\r
250 talking about using floating point, so those of you working in
\r
251 C or Pascal are fine. Those in assembly should realize that
\r
252 you have to do a bit of extra work to use fixed point, or you
\r
253 can beat out the floating point by hand if you want.
\r
255 First we have to "map" the texture onto the polygon.
\r
256 We have to define how the square texture map corresponds
\r
257 to the square polygon. This is relatively simple. Let
\r
258 one corner of the polygon be the origin (location <0,0>)
\r
259 of the texture map. Let each of the other corners
\r
260 correspond to corners just off the edge of the texture
\r
261 map (locations <256, 0>, <256, 256>, and <0, 256>).
\r
263 We'd like to use the equations in section 1, which
\r
264 require vectors P, M, and N, where P is the origin,
\r
265 and M & N are the axes for u&v (which are _roughly_
\r
266 the coordinates in the texture map, but see below).
\r
267 In other words, P, M and N tells us where the texture
\r
268 lies relative to the polygon. P is the coordinate in
\r
269 three-space where the origin of the texture is. M
\r
270 tells us which way the "horizontal" dimension of the
\r
271 texture lies in three-space, and N the "vertical".
\r
273 Suppose the polygon has four vertices V[0], V[1],
\r
274 V[2], and V[3] (all four of these are vectors). Then,
\r
275 P is simply V[0]. M is a vector going from the origin
\r
276 to the corner <256, 0>, so M is a vector from V[0] to
\r
277 V[1], so M = V[1] - V[0]. N is a vector from the origin
\r
278 to the corner <0, 256>, so N is V[3] - v[0].
\r
281 M = V[1] - V[0] { note these are vector subtractions }
\r
284 Again, remember that we need P, M, and N in the viewspace
\r
285 discussed with the equation, so make sure you've transformed
\r
286 the Vs appropriately, or you can compute P, M, and N in world
\r
287 or object space and transform them into viewspace.
\r
289 Compute the 9 magic numbers (vectors O, H, and V) as described
\r
290 in section 1. Now, take your 3D polygon and process it as normal.
\r
291 Scan convert it so that you have a collection of rows of pixels
\r
294 Now, iterate across each row. For each pixel in the polygon
\r
295 whose screen coordinates are <i,j>, apply the rest of the math
\r
296 described in section 1; that is, compute a, b, and c, and from
\r
297 them compute <u,v>.
\r
299 I said before that <u,v> are basically the texture map
\r
300 coordinates. What they are in truth are the coordinates in texture
\r
301 map space. Because of the way we defined texture map space,
\r
302 we'll actually find that u and v both run from 0..1, not 0..256.
\r
303 This is an advantage for descriptive purposes because u and v
\r
304 are always 0 to 1, regardless of the size of the texture map.
\r
306 So, to convert u&v to pixelmap coordinates, multiply them
\r
307 both by 256. Now, use them as indices into the texture map,
\r
308 output the value found there, and voila, you've texture mapped!
\r
310 The loop should look something like this:
\r
313 for every j which is a row in the polygon
\r
314 screen = 0xA0000000 + 320*j
\r
315 for i = start_x to end_x for this row
\r
316 a = Oa + (Ha * i) + (Va * j)
\r
317 b = Ob + (Hb * i) + (Vb * j)
\r
318 c = Oc + (Hc * i) + (Vc * j)
\r
321 screen[i] = texture_map[v][u]
\r
325 Once you've got that working, congratulations! You're
\r
326 done dealing with the annoying messy part, which is getting
\r
327 those 9 magic numbers computed right. The rest of this is
\r
328 just hard grunt work and trickery trying to make the code
\r
331 From here on in, I'm only going to look at the inner
\r
332 loop, that is, a single run (row or column), and let the
\r
333 rest of the runs be understood.
\r
336 TE Prepare to meet thy DOOM
\r
339 This subsection is concerned with vastly speeding
\r
340 up texture mapping by restricting the mapper to walls,
\r
341 floors and ceilings, or what is commonly called
\r
342 DOOM-style texture mapping, although it of course predates
\r
343 DOOM (e.g. Ultima Underworld [*], Legends of Valour).
\r
345 [* Yes, Underworld allowed you to tilt the view,
\r
346 but it distorted badly. Underworld essentially used
\r
347 DOOM-style tmapping, and tried to just use that on
\r
348 arbitrarily-angled polygons. I can't even begin to
\r
349 guess what Underworld II was doing for the same thing.]
\r
351 To begin with, let's take loop #1 and get as much
\r
352 stuff out of the inner loop as we can, so we can see
\r
353 what's going on. Note that I'm not going to do low-level
\r
354 optimizations, just mathematical optimizations; I assume
\r
355 you understand that array walks can be turned into
\r
356 pointer walks, etc.
\r
367 for i = start_x to end_x
\r
370 screen[i] = texture_map[v][u]
\r
376 With fixed point math, the multiplies by 256
\r
377 are really just shifts. Furthermore, they can be
\r
378 "premultiplied" into a and b (and Ha and Hb) outside
\r
381 Ok, so what do we have left? Three integer adds,
\r
382 a texture map lookup, and two extremely expensive fixed-point
\r
383 divides. How can we get rid of the divides?
\r
385 This is the big question in texture mapping, and most
\r
386 answers to it are _approximate_. They give results that
\r
387 are not quite the same as the above loop, but are difficult
\r
388 for the eye to tell the difference.
\r
390 However, before we delve into these, there's a very
\r
391 special case in which we can get rid of the divides.
\r
393 We can move the divide by c out of the loop without
\r
394 changing the results IFF c is constant for the duration
\r
395 of the loop. This is true if Hc is 0. It turns out that
\r
396 Hc is 0 if all the points on the run of pixels are the same
\r
397 depth from the viewer, that is, they lie on a line of
\r
398 so-called "constant Z" (I would call it "constant Y" in
\r
399 my coordinate system).
\r
401 The requirement that a horizontal line of pixels be the
\r
402 same depth turns out to be met by ceilings and floors.
\r
403 For ceilings and floors, Hc is 0, and so the loop can be
\r
407 __setup from loop #2__
\r
412 for i = start_x to end_x
\r
413 screen[i] = texture_map[v][u]
\r
418 Now _that_ can be a very fast loop, although adjusting
\r
419 the u&v values so they can be used as indices has been
\r
420 glossed over. I'll give some sample assembly in the next
\r
421 section and make it all explicit.
\r
423 First, though, let's look at walls. Walls are almost
\r
424 identical to floors and ceilings. However, with walls,
\r
425 Vc is 0, instead of Hc. This means that to write a loop
\r
426 in which c is constant, we have to walk down columns instead
\r
427 of across rows. This affects scan-conversion, of course.
\r
429 The other thing about walls is that with floors, since
\r
430 you can rotate about the vertical axis (Z axis for me, Y axis
\r
431 for most of you), the horizontal runs on the floors cut
\r
432 across the texture at arbitrary angles. Since you're
\r
433 bound to not tilt your head up or down, and since the
\r
434 polygons themselves aren't tilted, you generally find
\r
435 that for walls, Va is 0 as well. In other words, as you
\r
436 walk down a column of a wall texture, both a & c are constant,
\r
437 so u is constant; you generally only change one coordinate,
\r
438 v, in the texture map. This means the inner loop only needs
\r
439 to update one variable, and can be made to run _very_ fast.
\r
441 The only thing missing from this discussion for creating
\r
442 a DOOM clone is how to do transparent walls, how to do
\r
443 lighting things, and how to make it fast enough. These will
\r
444 be discussed in section 4, although the some of the speed
\r
445 issue is addressed by the inner loops in the next subsection,
\r
446 and the rest of the speed issue is discussed in general terms
\r
450 TE ...wrapped around your finger...
\r
453 So far, we've only looked at texture mapping a single
\r
454 polygon. Of course, it's obvious how to texture map
\r
455 a lot of polygons--just lather, rinse, repeat. But it
\r
456 may seem sort of wasteful to go through all the 3D math
\r
457 and all over and over again if we just want to have one
\r
458 long wall with the same texture repeating over and over
\r
459 again--like linoleum tiles or wallpaper.
\r
461 Well, we don't have to. Let's think about this
\r
462 idea of a "texture map space" some more. We defined
\r
463 it as being a coordinate system "superimposed" on
\r
464 the polygon that told us where the texture goes.
\r
465 However, when we implemented it, we simply used the
\r
466 polygon itself (in essence) as the coordinate space.
\r
468 To see this, make a polygon which is a rectangle,
\r
469 perhaps four times as long as it is tall. When it
\r
470 is drawn, you will see the texture is distorted,
\r
471 stretched out to four times its length in one dimension.
\r
472 Suppose we wanted it to repeat four times instead?
\r
474 The first step is to look at what the definition
\r
475 of the texture map space means. The texture map space
\r
476 shows how the physical pixelmap itself goes onto the
\r
477 polygon. To get a repeating texture map, our first
\r
478 step is to just get one of the copies right. If
\r
479 we set up our P,M, & N so that the M only goes one
\r
480 quarter of the way along the long edge of the
\r
481 rectangle, we'll map the texture onto just that
\r
482 quarter of the rectangle.
\r
484 Here's a picture to explain it:
\r
486 Polygon A-B-C-D Texture map
\r
489 o--------o-___________ B v=0 11112222
\r
490 |111112222 ---------o 11112222
\r
491 |111112222 | 33334444
\r
492 |111112222 | 33334444
\r
495 |333334444 ___________---------o
\r
499 So, we used to map (u,v)=(0,0) to A, (u,v)=(1,0) to B, and
\r
500 (u,v) = (0,1) to D. This stretched the texture map out to
\r
501 fill the entire polygon map.
\r
503 Now, instead, we map (u,v)=(1,0) to E. In other words,
\r
504 let P = A, M = E-A, and N = D-A. In this new coordinate
\r
505 space, we will map the texture onto the first quarter of
\r
508 What about the rest of the polygon? Well, it simply
\r
509 turns out that for the first quarter 0 <= u <= 1. For
\r
510 the rest, 1 <= u <= 4.
\r
512 To make the texture wrap around, all we have to do is
\r
513 ignore the integer part of u, and look at the fractional part.
\r
514 Thus, as u goes from 1 to 2, we lookup in the texture map using
\r
515 the fractional part of u, or (u-1).
\r
517 This is all very simple, and the upshot is that, once
\r
518 you define P, M, and N correctly, you simply have to mask
\r
519 your fixed-point u&v values; this is why we generally use
\r
520 texture maps whose sides are powers of two, so that we can
\r
521 mask to stay within the texture map. (Also because they
\r
522 fit conveniently into segments this way, and also so that
\r
523 the multiply to convert u&v values from 0..1 to indices
\r
526 I'm assuming that's a sufficient explanation of the
\r
527 idea for you to get it all setup. So here's the assembly
\r
528 inner loops I promised. I'm not going to bother giving
\r
529 the ultra-fast vertical wall-drawing case, just the
\r
530 horizontal floor/ceiling-drawing case.
\r
532 Note that a mask of 255 (i.e. for a 256x256 texture)
\r
533 can be gotten for free; however, no program that I'm
\r
534 aware of uses texture maps that large, since they
\r
535 simply require too much storage, and they can cache very
\r
536 poorly in the internal cache.
\r
538 First, here's your basic floor/ceiling texture mapper,
\r
539 in C, with wraparound, and explicitly using fixed point
\r
540 math--but no setup.
\r
543 mask = 127, 63, 31, 15, whatever.
\r
544 for (i=0; i < len; ++i) {
\r
545 temp = table[(v >> 16) & mask][(u >> 16) & mask];
\r
550 Now, here's an assembly one. This one avoids the
\r
551 shifts and does both masks at the same time, and uses
\r
552 16 bits of "fractional" precision and however many bits
\r
553 are needed for the coordinates. Note that I assume
\r
554 that the texture, even if it is 64x64, still has each
\r
555 row starting 256 bytes apart. This just requires some
\r
556 creative storage approaches, and is crucial for a fast
\r
557 inner loop, since no shifting&masking is required to
\r
558 assemble the index.
\r
562 mov mask2,ax ; setup mask to do both at the same time
\r
564 loop5 and bx,mask2 ; mask both coordinates
\r
565 mov al,[bx] ; fetch from the texture map
\r
567 add dx,ha_low ; update fraction part of u
\r
568 adc bl,ha_hi ; update index part of u
\r
569 add si,hb_low ; these are constant for the loop
\r
570 adc bh,hb_hi ; they should be on the stack
\r
571 loop loop5 ; so that ds is free for the texture map
\r
573 This code is decent, but nowhere near as fast as it can be.
\r
574 The main trick to improving performance is to use 32 bit adds
\r
575 instead of two adds. The problem with this is that extra operations
\r
576 are required to setup the indexing into the texture map. Through
\r
577 the use of the ADC trick, these can be minimized. In the following
\r
578 code, bl and bh are unchanged. However, the top half of EBX now
\r
579 contains what used to be in si, and the other values have been
\r
580 moved into registers. ESI contains hb_low in the top half, and
\r
581 ha_hi in the low 8 bits. This means that ADC EBX,ESI achieves
\r
582 the result of two of the additions above. Also, we start using
\r
583 BP, so we move our variables into the data segment and the texture
\r
589 add dx,bp ; update fractional part of u
\r
590 adc ebx,esi ; update u (BL) and frac. part of v (EBX)
\r
591 adc bh,ch ; update index part of v
\r
595 This is a bit faster, although it has one bug. It's
\r
596 possible for the addition into BL to overflow into BH. It might
\r
597 not seem to be, since BL is masked every iteration back down to
\r
598 stay in 0..127, 0..63, or whatever. However, if the step is
\r
599 negative, then BL will be decremented each iteration, and may
\r
600 "underflow" and subtract one from BH. To handle this, you need
\r
601 a seperate version of the loop for those cases.
\r
603 If you're not doing wraparound textures, you can speed the
\r
604 loop up a bit more by removing the and. You can run entirely
\r
605 from registers except for the texture map lookup. Additionally,
\r
606 unrolling the loop once cuts down on loop overhead, and is crucial
\r
607 if you're writing straight to the VGA, since it doubles your
\r
608 throughput to a 16-bit VGA card.
\r
610 Here's a very fast no-wraparound texture mapper. It uses
\r
611 the ADC trick twice. Note that the carry flag is maintained
\r
612 around the loop every iteration; unfortunately the 'and' required
\r
613 for wraparound textures clears the carry flag (uselessly). EBX
\r
614 and EDX contain u & v in their bottom 8 bits, and contain the
\r
615 fractional parts of v & u in their top bits (note they keep the
\r
616 _other_ coordinate's fractional parts). You have to have prepped
\r
617 the carry flag first; if you can't figure this technique out, don't
\r
618 sweat it, or look to see if someone else has a more clear discussion
\r
619 of how to do fast fixed-point walks using 32-bit registers.
\r
621 This loop is longer because it does two pixels at a time.
\r
623 loop7 mov al,[bx] ; get first sample
\r
624 adc edx,esi ; update v-high and u-low
\r
625 adc ebx,ebp ; update u-high and v-low
\r
626 mov bh,dl ; move v-high into tmap lookup register
\r
627 mov ah,[bx] ; get second sample
\r
631 mov es:[di],ax ; output both pixels
\r
632 inc di ; add 2 to di without disturbing carry
\r
637 I went ahead and 486-optimized the stosw/loop at the end to make
\r
638 cycle-counting easier. All of these instructions are single cycle
\r
639 instructions, except the branch, and the segment-override. So you're
\r
640 looking at roughly 15 cycles for every two pixels. Your caching
\r
641 behavior on the reads and writes will determine the actual speed.
\r
642 It can be unrolled another time to further reduce the loop overhead;
\r
643 the core operations are 9 instructions (10 cycles) for every two
\r
644 pixels. Note the "inc di/inc di", which protects the carry flag.
\r
645 If you unroll it again, four "inc di"s will be required. Unroll it
\r
646 another time, and you're better off saving the carry flag, adding,
\r
647 and restoring, for example "adc ax,ax/add di,8/shr ax,1", rather
\r
651 TE Lost My Shape (trying to act casual)
\r
654 Non-rectangular polygons are trivial under this system.
\r
655 Some approaches require you to specify the (u,v) coordinates
\r
656 for each of the vertices of the polygon. With this technique,
\r
657 you instead specify the 3D coordinates for three of the
\r
658 "vertices" of the texture map. So the easiest way of handling
\r
659 a texture of a complex polygon is simply to use a square
\r
660 texture which is larger than the polygon. For example:
\r
673 Now, we simply define the texture map such that P is P1,
\r
674 M is P2-P1, and N is P3-P1. Then, if our texture looks like
\r
685 Then the regions marked by '.' in the texture map will
\r
686 simply never be displayed anywhere on the polygon.
\r
688 Wraparound textures can still be used as per normal,
\r
689 and concave polygons require no special handling either.
\r
691 Also, you can get special effects by having M and N not
\r
692 be perpendicular to each other.
\r
699 This sections discusses some of the pitfalls and
\r
700 things-aren't-quite-as-simple-as-they-sounded issues
\r
701 that come up while texture mapping. All of the
\r
702 information is, to some extent, important, whether
\r
703 you've encountered this problem or not.
\r
706 TE Cl-cl-cl-close to the Edge
\r
709 At some time when you're texture mapping, you'll
\r
710 discover (perhaps from the screen, perhaps from a
\r
711 debugger) that your U & V values aren't within the
\r
712 0..1 range; they'll be just outside it.
\r
714 This is one of these "argh" problems. It is
\r
715 possible through very very careful definition of
\r
716 scan-conversion operations to avoid it, but you're
\r
717 likely to encounter it.
\r
719 If you use wraparound textures, you may not ever
\r
720 notice it, however, since when it happens, the
\r
721 texture will simply wraparound and display an
\r
724 If not, you may get a black pixel, or just
\r
725 garbage. It'll only happen at the edges of your
\r
728 The reason this happens is because your scan-conversion
\r
729 algorithm may generate pixels "in the polygon" whose
\r
730 pixel-centers (or corners, depending on how you've
\r
731 defined it) are just outside the texture--that is,
\r
732 they're outside the polygon itself.
\r
734 The right solution to this is to fix your scan-converter.
\r
735 If your texture mapper computes u&v coordinates based on
\r
736 the top-left corner of the pixel (as the one I've defined
\r
737 so far has), make sure your scan-converter only generates
\r
738 pixels whose top-left corner is really within the polygon.
\r
739 If you do this, you may need to make a minor change to my
\r
740 definition of M & N, but I'm not going to discuss this
\r
741 further, since you probably won't do this.
\r
743 A second option is to define P, M, and N such that the
\r
744 texture map space is slightly bigger than the polygon; that
\r
745 is, so that if you go just off the edge of the polygon,
\r
746 you'll still be within the texture map.
\r
748 This is a pain since you end up having to transform extra
\r
749 3D points to do it.
\r
751 The third, and probably most common solution, is to always
\r
752 use wraparound textures, which hide the problem, but prevent
\r
753 you from using textures that have one edge that highly contrasts
\r
756 The fourth, and probably second most common solution,
\r
757 and the one that turns out to be a real pain, is to "clamp"
\r
758 the u&v values to be within the texture all the time.
\r
760 Naively, you just put this in your inner loop:
\r
762 if (u < 0) u = 0; else if (u > 1) u = 1;
\r
763 if (v < 0) v = 0; else if (v > 1) v = 1;
\r
765 Of course, you don't really do this, since it'd slow
\r
766 you down far too much. You can do this outside the loop,
\r
767 clamping your starting location for each run. However,
\r
768 you can't, under this system, clamp the ending value
\r
771 Remember that in the loop we update u and v with
\r
772 (essentially) Ha/c and Hb/c. These are constant
\r
773 across the entire run, but not constant across the
\r
774 entire polygon, because c has different values for
\r
777 We can compute du and dv in a different way to
\r
778 allow for clamping. What we do is we explicitly compute
\r
779 (a,b,c) at (start_x, j) as we did before, but we also
\r
780 compute (a,b,c) at (end_x, j). From these we compute
\r
781 (u,v) at start_x & at end_x. Next we clamp both sets
\r
782 of u & v. Then we compute du and dv with
\r
784 du = (u2 - u1) / (end_x - start_x - 1)
\r
785 dv = (v2 - v1) / (end_x - start_x - 1)
\r
787 This is slightly more expensive than the old way,
\r
788 because we have to compute u2 and v2, which requires
\r
789 extra divides. However, for methods that explicitly
\r
790 calculate u&v sets and then compute deltas (and we'll
\r
791 see some in section 4), this is the way to go.
\r
793 One final thing you can do is interpolate the (a,b,c)
\r
794 triple from the vertices as you scan convert. This will
\r
795 guarantee that all (a,b,c) triples computed will lie be
\r
796 within the polygon, and no clamping will be necessary
\r
797 (but deltas must still be computed as above). However,
\r
798 you have to make sure the (a,b,c) values you compute at
\r
799 the vertices are clamped themselves, which is not too hard
\r
800 by a bit more complicated than clamping (u,v) values.
\r
803 TE Out of This Domain -- Zero's Paradox
\r
806 Divides by zero are ugly. We programmers don't
\r
807 like them. If this were an ideal world (a quick
\r
808 nod to mathematicians and some physicists), the
\r
809 texture mapping equations would be divide-by-zero-free.
\r
811 Unfortunately, it's a repercussion of the exact
\r
812 same problem as above that you can bump into them.
\r
814 Remember above, I noted that it's possible to
\r
815 get (u,v) pairs with a value just outside of the
\r
816 0..1 range, because a pixel we're texture mapping
\r
817 isn't even in the polygon?
\r
819 Well, even worse, it's possible for this pixel,
\r
820 which isn't in the polygon, to be along the horizon
\r
821 line (vanishing point) for the polygon. If this
\r
822 happens, your Y value (sorry, Z for most of you)
\r
823 would be infinite if you tried to compute the 3D
\r
824 coordinates from the screen coordinates; and in
\r
825 the (u,v) computation, you end up with a 0 value
\r
826 for c. Since u = a/c, blammo, divide by 0.
\r
828 Well, the solution is simple. Test if c is 0,
\r
829 and if it is, don't divide. But what _should_ you do?
\r
831 Well, let's look at an "even worse" case. Suppose
\r
832 the pixel is so far off the polygon it's across the
\r
833 horizon line. In this case, we'll end up with c having
\r
834 the "wrong" sign, and while our divide won't fault on
\r
835 us, our u&v values will be bogus.
\r
837 What do we do then?
\r
839 We can't clamp our a&b&c values very easily.
\r
840 Fortunately, it turns out we don't have to. If
\r
841 this happens, it means the edge of the polygon
\r
842 must be very close to the horizon, or the viewer
\r
843 must be very, very flat to the polygon (if you know
\r
844 what I mean). If so, the viewer can't really tell
\r
845 what should be "right" for the polygon, so if we
\r
846 screw up the u&v values, it really doesn't matter.
\r
848 So the answer is, don't worry if c gets the
\r
849 wrong sign, and if c comes out to be 0, use any
\r
850 value for u&v that you like--(0,0) makes an obvious
\r
853 I've never had a serious problem with this, but
\r
854 it is possible that this could actually give you some
\r
855 pretty ugly results, if, say, two corners of a polygon
\r
856 both "blew up", and you treated them both as being
\r
857 (0,0). It can also cause problems with wraparound
\r
858 polygons not repeating the right amount.
\r
864 Most polygon 3D graphics engines probably use
\r
865 the painter's algorithm for hidden surface removal.
\r
866 You somehow figure out what order to paint the polygons
\r
867 in (depth sort, BSP trees, whatever), and then paint
\r
868 them back-to-front. The nearer polygons obscure the
\r
869 farther ones, and voila!, you're done.
\r
871 This works great, especially in a space combat
\r
872 simulator, where it's rare that you paint lots of pixels.
\r
874 You can texture map this way, too. For example,
\r
875 Wing Commander II doesn't texture map, but it does
\r
876 real time rotation, which involves essentially the
\r
877 same inner loop. Wing Commander II is fast--until
\r
878 a lot of ships are on the screen close to you, at which
\r
879 point it bogs down a bit.
\r
881 If you care about not slowing down too much in the above
\r
882 case, or you want to do an "indoor" renderer with lots of
\r
883 hidden surfaces, you'll find that with texture mapping,
\r
884 you can ill-afford to use the painter's algorithm.
\r
886 You pay a noticeable cost for every pixel you texture
\r
887 map. If you end up hiding 80% of your surfaces (i.e. there
\r
888 are five "layers" everywhere on the screen), you end up
\r
889 "wasting" 80% of the time you spend on texture mapping.
\r
891 To prevent this, you have to use more complex methods
\r
892 of hidden surface removal. These will probably slow you down
\r
893 somewhat, but you should make up for it with the gain in texture
\r
896 The essential idea is to only texture map each screen pixel once.
\r
897 To do this, you do some sort of "front-to-back" painting, where
\r
898 you draw the nearest surface first. Any pixel touched by this
\r
899 surface should never be considered for drawing again.
\r
901 There are many ways to do this. You can process a single
\r
902 scanline or column at a time and use ray-casting or just
\r
903 "scanline processing", then resolve the overlap between the
\r
904 runs with whatever method is appropriate. You can stay
\r
905 polygonal and maintain "2D clipping" information (a data
\r
906 structure which tracks which pixels have been drawn so far).
\r
908 Beyond getting a fast inner loop for texture mapping,
\r
909 getting a fast hidden-surface-removal technique (and a fast
\r
910 depth-sorting technique if appropriate) is probably the
\r
911 next most crucial thing for your frame rate.
\r
913 But the details are beyond the scope of this article.
\r
915 Note that if you attempt to use a Z-buffer, you will
\r
916 still end up paying all of the costs of texture mapping for
\r
917 every forward-facing polygon (or at least 50% of them if
\r
918 you get really tricky; if you get really, really tricky,
\r
919 the sky's the limit.) I strongly doubt that any PC game
\r
920 now out, or that will come out in the next year, will
\r
921 render full-screen texture mapping through a Z-buffer.
\r
922 (Superimposing a rendered image on a Z-buffered background
\r
923 is a different issue and is no doubt done all the time.)
\r
928 TXR 4. The Complexities
\r
931 In this section we will discuss lots of miscellaneous
\r
932 topics. We'll look at some more optimizations, such as
\r
933 considerations for dealing with slow VGA cards, and how to
\r
934 texture map arbitrarily-angled polygons without doing two
\r
935 divides per pixel. We'll talk about a technique that lets
\r
936 you use textures with high-frequency components, and one way
\r
937 to integrate lighting into texture-mapping.
\r
940 TE Arbitrarily-Angled Polygons
\r
943 First suggestion: Don't. Set up your world to
\r
944 have all (or mostly) walls and floors. Supporting
\r
945 arbitrarily-angled polygons is going to slow you
\r
946 down, no matter what.
\r
948 The original texture mapping loop, which supported
\r
949 arbitrarily-angled polygons, required two divides per
\r
950 pixel. We don't have to go that slow, but we'll never
\r
951 go as fast as DOOM-style rendering can go. (However,
\r
952 as you start to use more sophisticated lighting algorithms
\r
953 in your inner loop, the cost of handling arbitrarily-
\r
954 angled polygons may start to become less important.)
\r
956 There is one way to texture map such polygons
\r
957 "perfectly" without two divides per pixel, and a
\r
958 host of ways to do it "imperfectly". I'll discuss
\r
959 several of these ways in varying amounts of detail.
\r
960 Your best bet is to implement them all and see
\r
961 which ones you can get to run the fastest but still
\r
962 look good. You might find that one is faster for
\r
963 some cases but not for others. You could actually
\r
964 have an engine which uses all the methods, depending
\r
965 on the polygon it's considering and perhaps a "detail"
\r
966 setting which controls how accurate the approximations
\r
969 The "perfect" texture mapping algorithm is
\r
970 described in another article, "Free-direction texture
\r
971 mapping". I'll summarize the basic idea and the
\r
972 main flaw. The basic idea is this. For ceilings
\r
973 and walls, we were able to walk along a line on
\r
974 the screen for which the step in the "c" parameter
\r
975 was 0; this was a line of "constant Z" on the
\r
978 It turns out that every polygon has lines of
\r
979 "constant Z"--however, they can be at any angle,
\r
980 not necessarily vertical or horizontal.
\r
982 What this means, though, is that if you walk
\r
983 along those lines instead of walking along a horizontal
\r
984 or vertical, you do not need a divide to compute your
\r
985 texture map coordinates, just deltas.
\r
987 The details can be found in the other article.
\r
988 The slope of the line to walk on the screen is something
\r
991 Note, however, that the "DOOM" approach was _just_
\r
992 an optimization for a special case. The wall & ceiling
\r
993 renderers produce exactly the same results as a
\r
994 perfect texture mapper, for the polygons that they
\r
995 handle (ignoring rounding errors and fixed-point precision
\r
996 effects). This is not true for the "free-direction"
\r
997 texture mapper. While there is a line across the
\r
998 screen for which the polygon has constant Z,
\r
999 you cannot walk exactly along that line, since you
\r
1000 must step by pixels. The end result is that while
\r
1001 in the texture map space, you move by even steps,
\r
1002 in the screen space, you move with ragged jumps.
\r
1003 With perfect texture mapping, you always sample
\r
1004 from the texture map from the position corresponding
\r
1005 to the top-left/center of each pixel. With the
\r
1006 free-direction mapper, you sample from a "random"
\r
1007 location within the pixel, depending on how you're
\r
1008 stepping across the screen. This "random" displacement
\r
1009 is extremely systematic, and leads to a systematic
\r
1010 distortion of the texture. I find it visually
\r
1011 unacceptable with high-contrast textures, compared to
\r
1012 perfect texture mapping, but you should try it and decide
\r
1013 for yourself. The technically inclined should note that
\r
1014 this is simply the normal "floor" renderer with an extra
\r
1015 2D skew, and that while 2D skews are trivial, they are
\r
1016 non-exact and suffer from the flaw described above.
\r
1018 The only other alternative for arbitrarily-angled
\r
1019 polygons is to use some kind of approximation. We
\r
1020 can characterize u and v as functions of i (the horizontal
\r
1021 screen position; or use 'j' if you wish to draw columns);
\r
1022 for instance, u = a / c, where a = q + i*Ha, c = p + i*Hc.
\r
1023 So we can say u(i) = (q + i*Ha) / (r + i*Hc).
\r
1025 Now, instead of computing u(i) exactly for each i,
\r
1026 as we've done until now, we can instead compute some
\r
1027 function u'(i) which is approximately equal to u(i) and
\r
1028 which can be computed much faster.
\r
1030 There are two straightforward functions which we
\r
1031 can compute very fast. One is the simple linear
\r
1032 function we used for DOOM-style mapping, u'(x) = r + x*s.
\r
1033 Since the function we're approximating is curved (a
\r
1034 hyperbola), a curved function is another possibility,
\r
1035 such as u'(x) = r + x*s + x^2*t. (SGI's Reality Engine
\r
1036 apparently uses a cubic polynomial.)
\r
1038 If you try both of these approximations on a very
\r
1039 large polygon at a sharp angle, you will find that
\r
1040 they're not very good, and still cause visible
\r
1041 curvature. They are, of course, only approximations.
\r
1042 The approximations can be improved with a simple
\r
1043 speed/quality trade-off through subdivision. The
\r
1044 idea of subdivision is that the approximation is
\r
1045 always of high quality for a small enough region,
\r
1046 so you can simply subdivide each region until the
\r
1047 subregions are small enough to have the desired
\r
1050 There are two ways to subdivide. One simple way
\r
1051 is to subdivide the entire polygon into smaller
\r
1052 polygons. This should be done on the fly, not
\r
1053 ahead of time, because only polygons that are at
\r
1054 "bad" angles need a lot of subdivision. After
\r
1055 dividing a polygon into multiple smaller ones,
\r
1056 render each one seperately. Use the original
\r
1057 P, M, and N values for all of the new polygons
\r
1058 to make the texture remain where it should be
\r
1059 after subdivision.
\r
1061 The (probably) better way to subdivide is to
\r
1062 subdivide runs instead of polygons, and so I'll
\r
1063 discuss this in more detail. The essential thing
\r
1064 is that to do an approximation, you evaluate the
\r
1065 original function at two or more locations and
\r
1066 then fit your approximate function to the computed
\r
1067 values. One advantage of run subdivision is that
\r
1068 you can share points that you evaluated for one
\r
1069 subrun with those of the next.
\r
1071 First lets turn back to the two approximations
\r
1072 under consideration. The first is what is called
\r
1073 "bilinear texture mapping", because the function
\r
1074 is linear and we're tracking two ("bi") values.
\r
1075 To use this method, we compute the function at
\r
1076 both endpoints: u1 = u(start_x), u2 = u(end_x).
\r
1077 Then we compute our start and step values. To
\r
1078 keep things simple, I'm going to assume the approximation
\r
1079 function u'(x) is defined from 0..end_x-start_x, not
\r
1080 from start_x..end_x.
\r
1082 So, the linear function u'(x) = r + s*x, where
\r
1083 u'(0) = u1 and u'(end_x - start_x) = u2 is met by
\r
1084 letting r = u1, s = (u2 - u1) / (end_x - startx).
\r
1086 Now, suppose our run goes from x = 10 to x = 70.
\r
1087 If we evaluate u(10), u(20), u(30), u(40),... u(70),
\r
1088 then we can have six seperate sections of bilinear
\r
1091 For a quadratic, there are several ways to compute
\r
1092 it. One way is to compute an additional sample in the
\r
1093 middle; u3 = u((start_x + end_x)/2). Then we can
\r
1094 fit u1,u2, and u3 to u'(x) = r + s*x + t*x^2 with:
\r
1096 len = (end_x - start_x)
\r
1097 k = u1 + u2 - u3*2
\r
1099 s = (u2 - u1 - 2*k)/len
\r
1102 Note that to use this in code, you cannot simply
\r
1103 use a loop like this:
\r
1108 because the r,s, and t values aren't correct
\r
1109 for discrete advancement. To make them correct,
\r
1110 do this during the setup code:
\r
1116 Then the loop of (...use R..., R += S, S += T) will work correctly.
\r
1118 The biquadratic loop will be slower than the linear
\r
1119 loop, but will look better with fewer subdivisions. You
\r
1120 can share one of the endpoints from one biquadratic
\r
1121 section to the next. Note, though, that you require twice
\r
1122 as many calculations of u&v values for the same number of
\r
1123 subdivisions with a biquadratic vs. a bilinear.
\r
1125 Another thing to do is to choose how to subdivide the run
\r
1126 more carefully. If you simply divide it in half or into quarters,
\r
1127 you'll discover that some of the subruns come out looking better
\r
1128 than others. So there are some things you can do to improve the
\r
1129 subdivision system. Another thing you can do is to try to make
\r
1130 most of your subruns have lengths which are powers of two. This
\r
1131 will let you use shifts instead of divides when computing r,s, and
\r
1132 t, which cuts down on your overhead, which lets you use more
\r
1133 subdivisions and get the same speed.
\r
1135 Note something very important. Subdivision increases the
\r
1136 overhead per run; biquadratic and other things increase the
\r
1137 cost of the inner loop. Before you go crazy trying to optimize
\r
1138 your arbitrarily-angled polygon renderer, make sure you're
\r
1139 rendering some "typical" scenes. The "right" answer is going
\r
1140 to depend on whether you have lots of very shorts runs or
\r
1141 fewer, longer runs. If you optimize based on a simple test
\r
1142 case, you may end up suboptimal on the final code.
\r
1144 You probably still want to have both a column-based and a
\r
1145 row-based renderer, and use whichever one the polygon is
\r
1146 "closer to" (e.g. if Hc is closer to 0 than Vc, use the row-based).
\r
1147 Note that the free-direction renderer looks its worst (to me)
\r
1148 for very small rotations, i.e. when Hc or Vc are very close to 0.
\r
1149 Since in these cases not much subdivision is needed, even if you
\r
1150 choose to use a free-direction mapper as your primary renderer,
\r
1151 you might still want to have "almost wall" and "almost floor"
\r
1152 renderers as well.
\r
1154 Finally, there is one more approximation method you can use,
\r
1155 which is faster than any of the ones discussed so far, but is
\r
1156 simply totally and utterly wrong. This is the approach used
\r
1157 by Michael Abrash in his graphics column in Dr. Dobbs. While
\r
1158 it's quite wrong, it works on polygons which are entirely
\r
1159 constant Y (sorry, Z), and can be a noticeable speedup.
\r
1161 What you do is 2D (instead of 3D) interpolation. You mark
\r
1162 each vertex with its coordinates in the texture map. Then when
\r
1163 you scan convert, you interpolate these values between vertices
\r
1164 on the edges of your runs. Thus, scan conversion will generate
\r
1165 runs with (u,v) values for the left and right end. Now simply
\r
1166 compute (du,dv) by subtracting and dividing by the length (no
\r
1167 clamping will be necessary), and use your fast bilinear inner
\r
1168 loop. When combined with 3D polygon subdivision, this approach
\r
1169 can actually be useful.
\r
1173 When the player is moving, set your internal quality
\r
1174 settings a little lower. When the player stops, switch
\r
1175 back to the normal quality; if the player pauses the game,
\r
1176 render one frame in normal quality.
\r
1178 If done right, you can get a small boost to your fps
\r
1179 without anyone being able to tell that you did it. You
\r
1180 may have to use normal quality if the player is only
\r
1181 moving very slowly, as well.
\r
1183 Note that while this may sound like an utterly cheap
\r
1184 trick just to improve the on-paper fps number, it's actually
\r
1185 quite related to the "progressive refinement" approach used
\r
1186 by some real VR systems (which, when the viewer isn't moving,
\r
1187 reuse information from the previous frame to allow them to
\r
1188 draw successive frames with more detail).
\r
1190 There are a number of ways of improving this cheat
\r
1191 intelligently. If the player is moving parallel to a polygon,
\r
1192 that polygon will tend to be "stably" texture mapped (similar
\r
1193 mapping from frame to frame). If there is any distortion from
\r
1194 your approximation, this will be visible to the player. So
\r
1195 this means a rule of thumb is to only cheat (draw with
\r
1196 above-average distortion) on polygons that are not facing
\r
1197 parallel to the direction of motion of the player.
\r
1203 If you're texture mapping, it's generally a good idea
\r
1204 to light your polygons. If you don't light them, then it
\r
1205 may be difficult to see the edge between two walls which
\r
1206 have the same texture (for instance, check out the "warehouse"
\r
1207 section of registered DOOM, which is sometimes confusing
\r
1208 when a near crate looks the same color as a far crate).
\r
1210 Lighting is actually pretty straightforward, although
\r
1211 you take a speed hit in your inner loop. I'm not going
\r
1212 to worry about actual lighting models and such; see other
\r
1213 articles for discussion on how to do light-sourced polygons.
\r
1215 Instead I'm going to assume you've computed the lighting
\r
1216 already. We'll start with "flat-run" shading, wherein an
\r
1217 entire run has the same intensity of light falling on it.
\r
1219 DOOM uses flat-run shading. A given polygon has a certain
\r
1220 amount of light hitting it, which is the same for the entire
\r
1221 polygon. In addition, each run of the polygon is sort-of
\r
1222 lit by the player. Since runs are always at a constant
\r
1223 depth, you can use constant lighting across the run and
\r
1224 still change the brightness with distance from the player
\r
1225 (DOOM uses something that resembles black fog, technically).
\r
1227 So the only real issue is _how_ you actually get the
\r
1228 lighting to affect the texture. Several approaches are
\r
1229 possible, but the only one that I think anyone actually uses
\r
1230 is with a lighting table.
\r
1232 The lighting table is a 2D array. You use the light
\r
1233 intensity as one index, and the pixelmap color as the
\r
1234 other index. You lookup in the table, and this gives
\r
1235 you your final output color to display. (With two
\r
1236 tables, you can do simple dithering.) So the only thing
\r
1237 you have to do is precompute this table.
\r
1239 Basically, your inner loop would look something like this:
\r
1241 ...compute light...
\r
1242 for (i=start_x; i <= end_x; ++i) {
\r
1243 color = texture[v >> 16][u >> 16];
\r
1244 output = light_table[light][color];
\r
1245 screen[i] = output;
\r
1250 The next thing to consider is to Gouraud shade your
\r
1251 texture map. To do this, you need to compute the light
\r
1252 intensity at the left and right edge of the run; look
\r
1253 elsewhere for more details on Gouraud shading.
\r
1255 Once you've got that, you just do something like this:
\r
1258 dz = ((light2 - light1) << 16) / (end_x - start_x);
\r
1259 for (i=start_x; i <= end_x; ++i) {
\r
1260 color = texture[v >> 16][u >> 16];
\r
1261 output = light_table[z >> 16][color];
\r
1262 screen[i] = color;
\r
1268 Note that you shouldn't really do this as I've written the
\r
1269 code. light1 and light2 should be calculated with 16 bits of extra
\r
1270 precision in the first place, rather than having to be shifted
\r
1271 left when computing z. I just did it that way so the code
\r
1272 would be self-contained.
\r
1274 I'm going to attempt to give a reasonably fast assembly
\r
1275 version of this. However, there's a big problem with doing
\r
1276 it fast. The 80x86 only has one register that you can
\r
1277 address the individual bytes in, and also use for indexing--BX.
\r
1278 This means that it's a real pain to make our inner loop alternate
\r
1279 texture map lookup and lighting fetch--whereas it's almost
\r
1280 trivial on a 680x0. I avoid this somewhat by processing two
\r
1281 pixels at a time; first doing two texture map lookups, then
\r
1282 doing two lighting lookups.
\r
1284 Here's a flat-shading inner loop. I'm doing this code off the
\r
1285 top of my head, so it may have bugs, but it's trying to show
\r
1286 at least one way you might try to do this. Since I use BP,
\r
1287 I put variables in the FS segment, which means DS points
\r
1288 to the texture, GS to the lighting table.
\r
1292 loop8 shr ax,1 ; restore carry
\r
1293 mov cl,[bx] ; get first sample, setting up cx for color lookup
\r
1294 adc edx,esi ; update v-high and u-low
\r
1295 adc ebx,ebp ; update u-high and v-low
\r
1296 mov bh,dl ; move v-high into tmap lookup register
\r
1297 mov ah,[bx] ; get second sample, save it in ah
\r
1300 mov dh,bl ; save value of bl
\r
1301 mov bx,cx ; use bx to address color map
\r
1302 mov al,gs:[bx] ; lookup color for pixel 1
\r
1303 mov bl,ah ; switch to pixel 2
\r
1304 mov ah,gs:[bx] ; lookup color for pixel 2
\r
1305 mov es:[di],ax ; output both pixels
\r
1306 mov bl,dh ; restore bl from dh
\r
1308 adc ax,ax ; save carry so we can do CMP
\r
1310 cmp di,fs:last_di ; rather than having to decrement cx
\r
1313 For a Gouraud shading inner loop, we can now have three
\r
1314 different numbers u, v, and z, which we're all adding at every
\r
1315 step. To do this, we use THREE adc, and we have to shuffle
\r
1316 around which high-bits correspond to which low-bits in a
\r
1317 complex way. I'll leave you to figure this out for yourself,
\r
1318 but here's an attempt at the inner loop.
\r
1320 loop9 shr ax,1 ; restore carry
\r
1321 mov al,fs:[bx] ; get first sample
\r
1322 mov ah,cl ; save away current z-high into AH
\r
1323 ; this makes AX a value we want to lookup
\r
1324 adc edx,esi ; update v-high and u-low
\r
1325 adc ebx,ebp ; update u-high and z-low
\r
1326 adc ecx,z_v_inc ; update z-high and v-low
\r
1327 mov bh,dl ; move v-high into tmap lookup register
\r
1328 mov ch,fs:[bx] ; get second sample, save it in ch
\r
1329 mov dh,bl ; save value of bl
\r
1331 mov al,gs:[bx] ; lookup first color value
\r
1334 mov ah,gs:[bx] ; lookup second color value
\r
1335 mov es:[di],ax ; output both pixels
\r
1336 mov bl,dh ; restore bl from dh
\r
1341 adc ax,ax ; save carry
\r
1343 cmp di,last_di ; rather than having to decrement cx
\r
1346 Notice that both of these loops are significantly slower
\r
1347 than the original loop. I'm not personally aware of any
\r
1348 generally faster way to do this sort of thing (but the code
\r
1349 can be tweaked to be faster). The one exception is
\r
1350 that for flat-run shading, you could precompute the entire
\r
1351 texture with the right lighting. This would require a lot
\r
1352 of storage, of course, but if you view it as a cache, it
\r
1353 would let you get some reuse of information from frame to
\r
1354 frame, since polygons tend to be lit the same from frame to
\r
1357 Finally, here's a brief discussion of transparency.
\r
1358 There are two ways to get transparency effects. The first
\r
1359 one is slower, but more flexible. You use _another_ lookup
\r
1360 table. You have to paint the texture that is transparent
\r
1361 after you've drawn things behind it. Then, in the inner
\r
1362 loop, you fetch the texture value (and light it) to draw.
\r
1363 Then you fetch the pixel that's currently in that location.
\r
1364 Lookup in a "transparency" table with those two values as
\r
1365 indices, and write out the result. The idea is that you
\r
1366 do this: table[new][old]. If new is a normal, opaque,
\r
1367 color, then table[new][old] == new, for every value of old.
\r
1368 If new is a special "color" which is supposed to be transparent,
\r
1369 then table[new][old] == old, for every value of old. This causes
\r
1370 old to show through. In addition, you can have translucency
\r
1371 effects, where table[new][old] gives a mixture of the colors
\r
1372 of old and new. This will let you do effects like the
\r
1373 translucent ghosts in the Ultima Underworlds.
\r
1375 However, the above approach is extremely slow, since you
\r
1376 have to load the value from the pixel map and do the extra
\r
1377 table lookup. But it works for arbitrary polygons. DOOM
\r
1378 only allows transparency on walls, not on ceilings and floors.
\r
1379 Remember we noticed that the special thing about walls is
\r
1380 that u is constant as you draw a column from a wall; you
\r
1381 are walking down a column in the texture map at the same
\r
1382 time you are drawing a column on screen. What this means
\r
1383 is that you can use a data structure which encodes where
\r
1384 the transparency in each column of the texture map is, and
\r
1385 use that _outside_ the inner loop to handle transparency.
\r
1386 For example, your data structure tells you that you have
\r
1387 a run of 8 opaque pixels, then 3 transparent pixels, then
\r
1388 5 more opaque ones. You scale 8, 3, and 5 by the rate at
\r
1389 which you're walking over the textures, and simply treat
\r
1390 this as two seperate opaque runs.
\r
1392 The details of this method depend on exactly how you're
\r
1393 doing your hidden surface removal, and since it doesn't
\r
1394 generalize to floors&ceilings, much less to arbitrarily
\r
1395 angled polygons, I don't think going into further detail
\r
1396 will be very useful (I've never bothered writing such a
\r
1397 thing, but I'm pretty sure that's all there is to it).
\r
1401 TE The Postman Always Rings Twice
\r
1404 If you're going to write to a slow 16-bit VGA card, you
\r
1405 should try your darndest to always write 2 pixels at a time.
\r
1407 For texture mapping, your best bet is to build your screen
\r
1408 in a buffer in RAM, and then copy it to the VGA all at once.
\r
1409 You can do this in Mode 13h or in Mode X or Y, as your heart
\r
1410 desires. You should definitely do this if you're painting
\r
1411 pixels more than once while drawing.
\r
1413 If, however, you wish to get a speedup by not paying
\r
1414 for the extra copy, you might like to write directly to the
\r
1415 VGA card from your inner loop.
\r
1417 You might not think this is very interesting. If the
\r
1418 write to the screen buffer in regular RAM is fast, how much
\r
1419 can you gain by doing both steps at once, instead of splitting
\r
1422 The reason it is interesting is because the VGA, while
\r
1423 slow to accept multiple writes, will let you continue doing
\r
1424 processing after a single write. What this means is that
\r
1425 if you overlap your texture mapping computation with your
\r
1426 write to the VGA, you can as much as double your speed on
\r
1427 a slow VGA card. For example, the fastest I can blast my
\r
1428 slow VGA card is 45 fps. I can texture map floor-style directly
\r
1429 to it at 30 fps. If I texture map to a memory buffer,
\r
1430 this is still somewhat slow, more than just the difference
\r
1431 between the 30 and 45 fps figures. Thus, my total rate if
\r
1432 I write to an offscreen buffer drops as low as 20 fps, depending
\r
1433 on exactly what I do in the texture map inner loop.
\r
1435 Ok, so, now suppose you've decided it might be a speedup
\r
1436 to write directly to the VGA. There are two problems. First
\r
1437 of all, if you're in mode X or Y, it's very difficult to
\r
1438 write two bytes at a time, which is necessary for this
\r
1439 approach to be a win. Second of all, even in mode 13h, it's
\r
1440 difficult to write two bytes at a time when you're drawing
\r
1441 a column of pixels.
\r
1443 I have no answer here. I expect people to stick to
\r
1444 offscreen buffers, or to simply process columns at a time
\r
1445 and write (at excruciatingly slow rates on some cards) to
\r
1446 the VGA only one byte at a time.
\r
1448 One option is to set up a page flipping mode 13h (which
\r
1449 is possible on some VGA cards), and to paint two independent
\r
1450 but adjacent columns at the same time, so that you can write
\r
1451 a word at a time. I have a very simple demo that does the
\r
1452 latter, but it's not for the faint of heart, and I don't
\r
1453 think it's a win once you have a lot of small polygons.
\r
1455 Another answer is to have a DOOM-style "low-detail"
\r
1456 mode which computes one pixel, duplicates it, and always
\r
1457 writes both pixels at the same time.
\r
1459 A final answer is just to ignore the market of people
\r
1460 with slow VGA cards. I wouldn't be surprised if this
\r
1461 approach was commonplace in a year or two. But if you do
\r
1462 so with commercial software, please put a notice of this
\r
1463 requirement on the box.
\r
1466 TE Mipmapping (or is it Mip-Mapping?)
\r
1469 Mipmapping is a very straightforward technique that
\r
1470 can be used to significantly improve the quality of your
\r
1471 textures, so much so that textures that you could not
\r
1472 otherwise use because they look ugly become usable.
\r
1474 The problem that mipmapping addresses is as follows.
\r
1475 When a texture is far in the distance, such that its
\r
1476 on-screen size in pixels is significantly smaller than
\r
1477 its actual size as a texture, only a small number of
\r
1478 pixels will actually be visible. If the texture contains
\r
1479 areas with lots of rapidly varying high contrast data,
\r
1480 the texture may look ugly, and, most importantly,
\r
1481 moire artifacts will occur. (To see this in DOOM, try
\r
1482 shrinking the screen to the smallest setting and going
\r
1483 outside in shareware DOOM. Many of the buildings will
\r
1484 show moire patterns. In registered DOOM, there is
\r
1485 a black-and-blue ceiling pattern which has very bad
\r
1486 artifacts if it is brightly lit. Go to the mission
\r
1487 with the gigantic round acid pool near the beginning.
\r
1488 Cheat to get light amplification goggles (or maybe
\r
1489 invulnerability), and you'll see it.)
\r
1491 Mipmapping reduces these artifacts by precomputing
\r
1492 some "anti-aliased" textures and using them when the
\r
1493 textures are in the distance.
\r
1495 The basic idea is to substitute a texture map half as
\r
1496 big when the polygon is so small that only every other
\r
1497 pixel is being drawn anyway. This texture map contains
\r
1498 one pixel for every 2x2 square in the original, and is
\r
1499 the color average of those pixels.
\r
1501 For a 64x64 texture map, you'd have the original
\r
1502 map, a 32x32 map, a 16x16 map, an 8x8 map, etc.
\r
1504 The mipmaps will smear out colors and lose details.
\r
1505 You can best test them by forcing them to be displayed
\r
1506 while they're still close to you; once they appear to
\r
1507 be working, set them up as described above.
\r
1509 Mipmapping causes a somewhat ugly effect when you
\r
1510 see the textures switch from one mipmap to the next.
\r
1511 However, especially for some textures, it is far less
\r
1512 ugly than the effect you would get without them.
\r
1514 For example, a fine white-and-black checkerboard
\r
1515 pattern (perhaps with some overlaid text) would look
\r
1516 very ugly without mipmapping, as you would see random
\r
1517 collections of white and black pixels (which isn't too
\r
1518 bad), and you would see curving moire patterns (which
\r
1519 is). With mipmapping, at a certain distance the whole
\r
1520 polygon would turn grey.
\r
1522 I do not believe any existing games for the PC
\r
1523 use mipmapping. However, examining the data file
\r
1524 for the Amiga demo version of Legends of Valour showed
\r
1525 smaller copies of textures, which made it look like
\r
1526 mipmapping was being used.
\r
1528 Mipmapping requires 33% extra storage for the
\r
1529 extra texture maps (25% for the first, 25% of 25%
\r
1530 for the second, etc.).
\r
1532 This may also be a good idea for 2D bitmaps which
\r
1533 are scaled (e.g. critters in Underworld & DOOM, or
\r
1534 ships in Wing Commander II--although none of those
\r
1535 appeared to use it.)
\r
1537 SGI's Reality Engine does mipmapping. Actually,
\r
1538 it does a texturemap lookup on two of the mipmaps,
\r
1539 the "closer" one and the "farther" one, and uses
\r
1540 a weighted average between them depending on which
\r
1541 size is closer to correct. (The RE also does
\r
1542 anti-aliasing, which helps even more.)
\r
1547 TXR Where Do We Go From Here?
\r
1550 The above discussion mostly covers what is basically
\r
1551 the state of the art of texture mapping on the PC. Hopefully
\r
1552 in the future every game will be at least as fast as the
\r
1553 inner loops in this article allow.
\r
1555 As long as people want full-screen images, it'll
\r
1556 be a while before we have enough computational power
\r
1557 to do more than that. But if we did have more power,
\r
1558 what else could we do with it?
\r
1561 o Colored lighting (requires complex lookup tables)
\r
1562 o Phong shading (interpolation of normals--one sqrt() per pixel!)
\r
1563 o Higher resolution (640x400, or 640x400 and anti-alias to 320x200)
\r
1564 o A lot more polygons
\r
1565 o Bump mapping (can be done today with huge amounts of precomputation)
\r