11-28-08 - Historic Capitol Hill

One of my favorite things to do here is just walk around the neighborhood. The streets are lovely; not exactly tree-lined because Seattlites have a great loathing of trees that block the sun, but still verdant, and with great old houses, lovely brick apartments , and lots of variety.

I really dig the "Tudor style" (or "mock tudor" or "Tudorbethan") medieval-esque apartments, many with round towers. One of the men primarily responsible is Anhalt ( and more ). Most of the stuff was built in the late 20's just before the Great Depression, in our penultimate building boom. Architecturally most of the buildings are absolutely monstrous; they combine country cottage style elements with pretentions of grandiosity. They mix graceful decorative elements with heavy blocky imposing forms, and it all seems just randomly slapped together with little logic. Still they are like jewels compared to modern construction.

As the Mike Whybark guy says the Loveless building is also quite lovely; sometimes the gate is open and you can wander into the interior courtyard. Bacchus is gone and in its place is Olivar, a very mediocre restaurant, but the Russian-themed murals on the walls are quite remarkable, and apparently original.

Random other page with historic capitol hill photos .

It's annoying that "Capitol Hill" also exists in Washington DC.

Other local highlights : mansion row on 14th street is okay of course. The mansions over at Harvard and Prospect are good too; here's an example . Hmm. Weird. Google Maps shows that spot as "Lakeview Place" which is supposedly a park, but I haven't noticed a public park there. It has a bare bones entry at seattle.gov

Also I just heard the stairway over at E Blaine and Broadway is nice.

Oh, and the St. Mark's Greenbelt is mostly lame; it should have great views but there are too many trees. I was very amused to find this story by a guy who made a fort in the Greenbelt though.


11-28-08 - Optimization

Yes yes, I'm generally in the "measure carefully and only optimize where needed" camp which the pretentious and condescending love. Weblog pedants love to point out pointless optimizations and say "did you really measure it?". Fie.

There's a great advantage to just blindly optimizing (as long as it doesn't severely cost you in terms of clarity or maintainability or simplicity). It lets you stop worrying. It frees up head space. Say you do some really inefficient string class and then you use it all over your code. Every time you use you have to wonder "is this okay? am I getting killed by this inefficient class?" and then you measure it, and no, it's fine. Then a month later you worry again. Then a month later you worry again. It gives you great peace of mind to know that all your base components are great. It lets you worry about the real issue, which is the higher level algorithm. (note that micro-optimizing the higher level algorithm prematurely is just always bad).

11-28-08 - Internet Crap

The explosion of garbage on the internet is effectively now making the internet smaller. As long as you confine yourself to a small arcane field like I dunno hash table algorithms the "internet" seems to work fine - you can do searches and find good pages and such.

But try doing something more mainstream today. Even if you search for something like DXTC you will be deluged with garbage. God help you if you try to find something like product reviews for consumer goods like microwaves or something. What you will find is : automatic aggregator sites that are worthless, sites camping on the keywords that literally have zero real content and just ads, sponsored sites, web forums with lots of retards writing reviews they know nothing about, etc..

The net effect is that smart internet users are disconnecting from the global web and retreating into "neighborhoods". More and more now I go directly to sites I know rather than doing global searches. If I want camera reviews I go to dpreview, if I want information about Windows I go straight to MSDN, and for general information I go to my friends' blogs.

This is really a negative thing. It means I'm not getting information from new sources, I'm not exploring the web and finding new connections. Metaphorically I imagine a vast cityscape of internet information, and there's goods out there I'd like to find, but 90% of the houses are now literally spewing feces and vomit out their doors, so I have built a wall of sandbags around my internet home. I look out on the city of shit and long for a time when I could aimlessly wander its streets and stumble upon real treasures.

11-28-08 - Google = SF and MS = Seattle

Just like Google has amazing maps data only in SF, MS has really amazing maps data in Seattle.

I think those companies are pretty good representatives for their respective cities.

11-28-08 - Chance of CRC being bad

Say I run the CRC32 on data chunks. I've got them identified by name so I only care about false equality (collisions) on objects of the same name. I'm going to be conservative and assume I only get 31 good bits out of the CRC so there are 2^31 values, not a full 2^32. I think around 20 versions of the data per name max. So the chance of collision is a birthday "paradox" thing (it's not a freaking paradox, it's just probability!) :

P = "1 - e^( - 20*19/(2*(2^31)) )" = 8.847564e-008

Now if you have N names, what's the chance that *any* of them has a collision?

C = 1 - (1 - P)^N

So, for what N of objects is C = 50% ?

log(0.5) = N * log(1 - P)

N = log(0.5)/(log(1-P)

N = 7,834,327

You can have 7 Million files before you get a 50% chance of any one of them having a collision.

For a more reasonable number like N = 100k , what's the chance of collision?

C = "1 - (1 - 8.847564*(10^-8))^100000" = 0.00881 = 0.881 %

These probabilities are very low, and I have been pessimistic so they're even lower, but perhaps they are too high. On any given project, an 0.881% chance of a problem is probably okay, but for me I have to care about what's the chance that *any customer* has a problem, which puts me closer to the 7 M number of files and means it is likely I would see one problem. Of course a collision is not a disaster. You just tell Oodle to refresh everything manually, and the chance is that only happens once to anybody ever in the next few years.

BTW we used CRC32 to hash all the strings in Oddworld : Stranger's Wrath. We had around 100k unique strings which is pretty close to the breaking point for CRC32 (unlike the above case, they all had to be unique against each other). Everything was fine until near the end of the project when we got a few name collisions. Oh noes! Disaster! Not really. I just changed one of the names from "blue_rock.tga" to "blue_rock_1.tga" and it no longer collided. Yep, that was a crazy tough problem. (of course I can't use solutions like that as a library designer).

11-27-08 - WTF Seattle

It's freezing in our apartment and as usual the landlord has ignored my call. I figured I'd look up the law . WTF Seattle. The law here only requires heat to get you to 58 degrees at night. 58 !?!!? 58 is like hobos huddling together level of warmth. Our apartment is somewhere in the low 60's and apparently that's perfectly legal. In SF the requirement is for 68 degrees, which is actually a bit retardedly high IMO; plus heat is a lot more crucial here than in SF. A requirement around 65 would be best.

In other WTF Seattle news - there's disgusting rotting leaf slop everywhere around the streets and sidewalks here. Early on when I got here I noticed there was no city street sweeping. No street sweeping at all! I noticed it mainly because you don't have to ever move your car when it's parked on the street, which does save a lot of stress worrying about the street sweeping days. It was fine through the summer, but then all the leaves fell, and it rained and rained, and nobody swept them up. And no I'm not talking about in front of my house, I'm talking about 25% of the homes around the city. Now the fallen wet leaves have rotted and turned into a brown slop that squishes and sticks in your shoes.

My first thought was "they don't bother street sweeping because it rains so much", but the real answer, like most answers to "WTF Seattle" is "no income tax".

ADDENDUM : neighbors are having another backyard fire and sing along. You can't really possibly imagine how outrageous this is. It's not like we have big properties. The buildings on capitol hill are literally less than 10 feet apart, so their fire pit is about 20 feet from our bedroom and the smoke comes right in the window. So I went out to talk to them tonight after midnight when they were having another guitar sing a long around the fire. It seems like a fun hippy crowd, they drink and sing and hang out by the fire, I wish I was doing that. But not right by my freaking bedroom, WTF. I hate the experience of confronting people. I hate being the fucking nosey asshole neighbor. But I hate sitting in here fuming about doing nothing even worse. And once I make a stink I expect results. If they don't cooperate voluntarily I have various mechanisms to extract action, though I hate to get into a whole neighbor war situation. I've never complained to neighbors about noise before in my life and now I've done it three times with two different neighbors here.

Bleck. One thing that bugs me is that when we looked at this apartment we *knew* these things would be issues. We looked at the squeaky old wood floors, and the thin drafty windows, and the hippy house right next door, and we explicitly called out every one of those problems. But we took the place anyway because it was the best thing we'd seen in 8 days of intensive looking and the hotel bill was getting out of hand and the effort of searching for apartments every day had worn us down.

We might have to just move.

Buying a house is so scary to me because you never know if your neighbors are going to be inconsiderate nutjobs. Sure if you get away from the apartment/youngster area you can probably avoid big late night parties, but as you get into the yuppie suburban areas you start getting the Mr. Fix It 7 AM home improvement guy who's constantly redoing everything in his house and doing it himself so he works at all hours and it just never ends.

Buying a condo just seems like madness to me because your quality of life still hinges so strongly on your neighbors and you can't be sure what you're getting into (or that they will stay the same). Plus you have to go through the condo maintenance guy still and just hope he's reasonable. You have almost all the problems of apartments, but you're locked into it so you can't easily flee if it turns out to be a disaster.

I need to buy like 20 acres of undesirable swamp land and just park a trailer in the middle of it. Ah, sweet sleep.


11-26-08 - Oodle

OMG I finally got hot loading sort of working with bundles and override bundles. So you can build a package of a bunch of files. The game runs and grabs the package that pages in. You touch some file, the game sees the change, that causes it to make a single-file-bundle for the changed file, it loads in the new bundle, and then patches the resource table so that the new version is used instead of the old one in the package. Then the Game Dictionary is updated so that future runs will also load the new version.

Phew. It works but it's fragile and I'm not happy with how careful you have to be. There are lots of ways it can fail. For example, the packages all load one by one asynchronously. If you fire the big package load first, it may come in and get hooked up before the override package. Then if you hook up your game resources, you have hooked up to the old (wrong) data. You can prevent this by doing a call on each resource that you get to say "is this the right version of this resource" before hooking up to it. But currently there's just a ton of stuff like that you have to be aware of and make sure to do the right call, etc. etc.

I keep coming back to the problem that I need to know whether I have the right version of the file in a package. There are three options I can see :

1. Fuck it. Do nothing. This requires the game to load the right versions all the time. Clients could, for example, always just work unpackaged while editing, then when they make paging packages they would simply not be editable or incremental-linkable in a nice way. This is not a happy solution but it "works" in the sense that it is *reliable* about not working.

2. Use mod times. This works fine and is easy and fast *as long as you only touch and make files on your local machine*. Which would fail for major game devs since they tend to compile files on some kind of build server, or check in and share compiled files. But if you think about the way programmers work - we don't ever check in our .OBJ files, we just check in sources and everybody has to rebuild on their local machine. If you make your artists do the same thing - only share source files and always local rebuild - OR sync to a full compiled set from the server and only run compiled, but never try to mix local changes with server compiles - then it works fine.

3. Use CRC's. This is the most robust and reliable. The packages store the CRC of the files they were made from. If the source file on disk has a different CRC, we assume that the source file is better. (we don't try to use mod times to tell who's newer because of the server-client mod time problem). If the CRC is different we repackage using the local source file. Then when we load we always prefer packages that have content that matches the local CRC. This works and is stable and good and all. The only problem is all the time you spend doing CRC's on files, which may or may not be a big deal. Obviously running over your whole tree and doing CRC's all the time would be ridiculous, but you can use mod time to tell when to redo the CRC, so you only incrementally redo it. Even getting all the mod times is too slow, so you can use a persistent Watcher app to cache the drive dir listing and file infos.

The "CRC" method is not necessarilly "right" in the sense that it doesn't load the newest version of the content, but it is reliable, it always does the same thing - it loads content that corresponds to the source version on the user's disk.

BTW this last thing is something the OS should really do but totally fails on. With the amount of memory we have now, the OS should just keep a listing of every dir you've ever visited in memory at all times. Even if you visit every dir on the disk it would only be 100 Megs. You could cap it at 64 MB or something and LRU, and obviously having 64 MB of dir listings is going to make your dir access instant all the time. I might just write an app to do this myself.

I'm kind of tempted to offer #1 and #3 to clients and not offer #2. I don't want to offer any features that sort of work or will have weird behavior that looks like "bugs".

Also BTW yes of course there will also just be a mode to load what's in the packages and not even look at what's newest or anything. This is what the real shipping game will use of course, once you have your final packages set up it just loads them and assumes you made them right.

11-26-08 - Offices

God damn I hate how you can't open windows in any fucking office buildings any more. That shit should be illegal. We need laws for everything because if it's not a fucking law the fuckers will fuck you over. This office we're in is really very nice so far as offices go, but the air gets all stale, and god forbid if somebody makes something in the microwave it smells the place up for the next 24 hours because THERE's NO FUCKING WINDOWS. Urg.

I despise the construction of new offices; I think the reason I hate new tract homes and condos so much is that they feel like offices to me. Drywall, aluminum or latex windows, berber carpet, bleck.

On the plus side, new places actually block sound and air and rain pretty well. Our apartment is practically like camping. It's super cold and drafty, sound comes right through the windows like there's nothing there. And of course to complete the camping parallelism, our neighbors light fires in the middle of the night and stand around and drink and smoke and are just generally redneck white trash. Ah, camping, getting away from the crowded city so you can be in an assigned spot right next to a bunch of rednecks that run their car engines all night long, with no walls to protect you from the sound. What a treat.

I'm a really sensitive baby when it comes to work spaces. I need wood and plants and light and fresh air. Ideally 50's modern design, some long horizontal lines ala Richard Neutra or FLW. Little things really bug me, like having people behind my back, or really any kind of distraction when I'm really trying to focus. I understand the value of having everybody in a game company all together in a big open space, and that's cool to some extent, but I can only stand that for maybe an hour out of the day and then I want to retreat into my womb-like office that opens out onto a courtyard with limestone floor, ferns around the edges, a big oak in the middle, and a trickling fountain.

Oh, this is only tangentially related, but this place : The House in Magnolia (Seattle) is pretty good for 50's modern. Unfortunatey it's become very popular of late, so it's all very expensive now. People run businesses going to the midwest where they have no idea what's good or bad and buying up the great 50's designer stuff and bring it to the coasts.

Continuing the tangent - I'm sitting on Tom's Swopper right now. I've been sitting on physioballs for years, so it's not that different than that really. It's good for you because you sit "actively" and you can't slouch and all that. But I'm starting to think the biggest benefit is that it's just so damn uncomfortable that you move around and get up a lot and don't ever stay in the same position for a long time.

11-25-08 - Funnies

Things to cheer me up :

SFW Porn ZOMG this is so disturbing.

Tuff Fish plays Mario ; okay, the audience for this is pretty small; you have to be a video game geek and also be familiar with Tuff Fish and his amazing HYACHACHA moment. If you are, then this is the funniest thing ever.

FAIL is always good.

Charlie sings Night Man ; best moment ever on TV.


11-25-08 - Drear

It's so dark and drizzly here all the time. Even on a clear sunny day, the sun doesn't climb more than 20 degrees above the horizon. Today it's like gray pea soup outside. It makes me just want to stay in bed. Wake me in March.

ADDENDUM : one day later it's gorgeous out, so I'm being taught that when I whine in my blog things magically get better. Being the whiney child that I am this is a bad lesson for me.

11-24-08 - Unix Utils

I've always hated "cygwin". I don't want a crappy semi-emulation of a Unix shell. I just want to be able to run some of the Unix utils (like grep and patch) natively as Win32 console apps in my 4DOS (TCC). Most of the POSIX API is available natively in Win32, so just use that! Well, here you go : Unix Utils in native Win32 . Yay.

11-24-08 - Lua Coroutines

.. are cool. Coroutines is the awesome way to do game logic that takes time. The script looks like :

if ( GetState(Friends) == dancing )
    LookAt( Friends );
    speech = StartSpeech("Screw you guys");
    Go( Gome );

These things take various amounts of time and actually yield back execution to the game while that happens. Each NPC or game object has its own coroutine which the game basically just "pings". The Lua code really only ever runs a few lines, all it does is pick a new task and make the object do that task, and then it yields again. (obviously if it was just a static list like the example above you could do that easily other ways, the good thing about script is it can do conditionals and make decisions as it goes).

Coco lets you do a real Lua yield & resume from C calls, which lets you build your "threading" into your game's helper functions rather than relying on your designers to write scripts that yield correctly. LuaJIT (at the Coco link) is cool too.

I don't really like the Lua syntax, and I also personally hate untyped languages. (I think the right thing for scripting style languages is "weakly typed", that is all numeric types convert silenty and thing do their ToString and such, but I like to just get a compile error when I try to pass an alligator to the factorial function. Also having explicit types of "location" and "NPC" and such would be nice.) But having an off the shelf scripting language that does the coroutine thing is a pretty big win.

I generally think of simple game AI as being in 2 (or more) parts. The 2 main parts are the "brain" (or reactions) and the "job" (or task). The job/task is what the guy is currently doing and involves a plan over time and a sequence of actions. This is best done with a coroutine of some kind. The brain/reaction part is just a tick every frame that looks around and maybe changes the current task. eg. it's stuff like "saw the player, go into attack job". That can be well implemented as a regular Lua function call that the game just calls every frame, it should just do stuff that doesn't take any time and return immediately.


11-24-08 - The Economy and the Wrong Argument

The pro-corporate anti-regulation junta has already won because they've succeeded in making the debate about whether you support the "free market" or not. Umm, as opposed to what? Who exactly is on the other side? There's zero debate here, it's a common tactic, they just keep saying "I believe in the free market" even though nobody is saying that they don't. And that really has absolutely nothing to do with what we should be talking about.

The reality is that we don't have a completely free market and we *shouldn't* and no sane person would say that we should. We have regulations and laws. Those laws protect consumers, they stabilize the economy (or at least they're supposed to), and they help companies.

In fact, at the most fundamental, the entire existance of the idea of a "Corporation" is a fabrication which is enforced by our government and in fact is a very powerful distortion of a true anarchistic/libertarian free market. The corporation allows individuals to take very great risks without personal responsibility. That's okay if it is balanced by requirements that make sure they are supportive of the greater good of the nation. Furthermore, the whole Federal Reserve system is a big distortion of "free market" ; it's sort of ironic for these Federal Reserve chiefs to be lobbying for the free market; umm.. so your giving cheap money to a select few institutions, what exactly is free about that? But again, that's fine, the Fed is a reasonable thing to have to stabilize the economy for the greater good of the nation - but only if the people getting fed money are held responsible.

The debate is not about "free market" vs. "not free market". The debate should be about what kind of regulatory structure will be best for the people of America as a whole. Often what's good for certain large businesses is also what's good for America, but not always. The real debate here is not about "free markets" but about corporatism - the idea that if the government acts in the best interest of large corporations that it is also helping the majority of Americans.

I just randomly found this blog by another programmer with liberal leanings who actually does some research and doesn't just write nonsense like me.

11-23-08 - Hashing & Cache Line Size

A while ago I wrote about hash tables and reprobing and cache sizes. I mentioned Cliff Click's nonblocking hash table which is sort of interesting to me in theory but also scary and not something I really need at the moment. Anyhoo I was looking at : Some dude's C version of the lock free hash table and I noticed a little thing that's relevant to my old discussion.

He explicitly uses the cache line size and does +1 linear stepping along the cache line, then does a complex reprobe by rehashing, then does +1 again while in the cache line, then rehashes, etc. That seems like a pretty reasonable thing to do. I guess I considered that at the time, but didn't want to do it because it requires you to explicitly know the cache line size of the machine you're on. It literally requires a hard coded :

NumLinearSteps = CACHE_LINE_SIZE / sizeof(table_entry);

Some other : blogs about the lock free hash have extensive link lists.


11-22-08 - WTF Seattle

It's midnight and my neighbors just lit a fire in their back yard. WTF is wrong with this city?

I'm a little torn because hey, people can have parties once in a while, we don't need to forbid all after midnight socializing. There's a certain social contract to being neighbors in the city, mostly it means being reasonably quiet, but it also means sometimes being tolerant of others' noise.

Ugh, I was about to go to bed and now I'm so full of rage and self loathing that I'll be up for hours.

Double WTF some drunk driver just ran into the telephone pole in front of my apartment.

All of this would not really be exceptional in a big city like NY or SF or whatever, but Seattle is really quite small and feels completely dead at night, the streets are empty and yet there's a high concentration of craziness right next to me.

11-22-08 - Stupid Google Searches

I've done some really bone-headed foolish google searches recently.

The other day Casey was talking about buying a "CT Butt" which I guess is some type of pork roast, so of course I go straight to google and search Images for "CT Butt". Yeah, dumb.

Now today I just read that Hung from Top Chef is gay, which surprised me, so I figured I'd google and see if it was true. So of course I just go to google and type in "Hung Gay". Umm.. yeah. Whoah, rookie mistake. Of course all those people named "Hung" or "Wang" are really just setting themselves up for google problems.

11-22-08 - Rasterization

I just found this pretty sweet introduction article on barycentric rasterization from 2004. It's not super advanced, but it starts at the beginning and works through things and is very readable. There are some dumb things in the block checking, so if you care go to the last page and see the posts by "rarefluid".

BTW the "edge equations" are really 2d plane equations (edge cross products). Checking just the edge planes against 8x8 blocks is only a rough quick reject. You can have blocks that are right outside of one vertex at an acute corner, and those blocks are "inside" all the planes but outside of the triangle. The code they have posted also checks against the bounding box of the whole triangle which largely fixes this case. At most they will consider one extra 8x8 block which doesn't actually contain any pixels.

(it's also really not yet a full barycentric rasterizer, he's just doing the edge tests that way; from his other posts I figure he's doing interpolation using the normal homogenous way, but if you're doing the edge-tests like that then you should just go ahead and do your interpolation barycentric too).

This kind of block-based barycentric rasterizer is very similar to what hardware does. One of the nice things about it is the blocks can easily be dispatched to microthreads to rasterize in parallel, and the blocks are natural quanta to check against a coarse Z buffer.

The old Olano-Greer paper about homogenous coordinate rasterization is now online in HTML. Some other random junk I found that I think is just junk : Reducing divisions for polygon mappers & Triangle Setup .

This blog about Software occlusion culling is literally a blast from the past. As I've often suggested, if you care about that stuff, the SurRender Umbra technical manual is still the godly bible on all things occlusion. (or you can read my ancient article on it ). But I also still think that object-based occlusion like that is just a bad idea.

Coarse Z that's updated by the rasterizer is however a good thing. Doing your own on top of what the card does is pretty lame though. This is yet another awesome win from Larrabee. If we/they do a coarse Z buffer, it can get used by the CPUs to do whole-object rejections, or whole-triangle rejections, or macroblock rejections.

Apparently the guy who wrote that top article is Nicolas Capens ; he wrote "swShader" which was an open source DX9 software rasterizer, which got taken down and is now a commercial product (which was a silly thing to do, of course any customer would rather buy Pixomatic !!). I learned this from a random flame war he got in. Don't you love the internet?


11-21-08 - DXTC Part 4

So I finally implemented the end point lsqr fit from indeces thing that Simon does for myself. One thing immediately fell out - doing just 4 means and then end point fit pretty much equals the best mode of Squish. That's very cool because it's quite fast. Note that this is not doing all the searches of all possible clusterings - I just pick one clustering from 4 means and then optimize the end points for those indeces. (when I do 4 means I actually try a few different ways as I mentioned previously, I try all the permutations of putting the 4 means on the 4 palette entries, which is 4!/2 ways = 12 ways, but then I only optimize the best one of those, so it's still very fast).

One thing I noticed is that the lsqr fit really doesn't do much other than shift an end point by one. That is, the end points are in 565 already, you can do this nice optimization in floats, but when you quantize back to 565 you pretty much always hit the point you started with or at most a step of 1.

So the new "CB" modes are :

CB 1 = just 4 means then lsqr fit , faster than Squish, a bit slower than ryg. Quality is roughly competitive with Squish, but they have different strengths and weakness, so taking the best of the two might be reasonable. Squish never beats "CB 1" by a lot, but "CB 1" kills it on the weird "frymire" image.

CB 2 = CB 1 followed by simulated annealing.

CB 3 = Very slow heavy optimization. This is an attempt to see what "optimal" would get you. It tries "CB 2" and then it also tries using all 16 colors in the block as end points, so 16*16 trials, does the lsqr optimization on each trial, and then anneals the best of those. There are still various other things to try that might find a better result, but this is already pretty crazy slow. This is too slow to use in real production, the idea is simply to get an idea of how close "CB 2" is to optimal. Of course "CB 3" could still be far off optimal, I'm only conjecturing that it's close.

One of the interesting things to look at is the curve of diminishing returns from CB1 to 2 to 3. In most cases there's only a small improvement from 2 to 3, but there are exceptions, mainly in the weird degenerate images. kodim02 is one case (this is a photo but it's almost all red), and frymire of course. That meets expectations. On noisy natural images, the cluster of colors is pretty close to Gaussian noise, which works well with the PCA single line fit and the least-squares contiuous distribution approximation. On weird images with degenerate cases there can be stranger optimal solutions (for example : putting one of the color end points outside of the volume of original colors, so that one of the interpolated 1/3 colors can hit a certain value more precisely).

ADDENDUM : you might validly object and ask why the annealing is not getting closer to the global optimum. There are some approximations in the annealing that are hurting. For one thing I only try wiggling the ends by 1 step in 565. Then I don't really run it for very long, so it doesn't have a chance to make big walks and get to really different solutions. All it can really do is local optimizations with small steps to tunnel past local barriers to find better minima - it's not trying huge steps to other parts of the solution space. Theoretically if I ran a much longer annealing schedule with more time spent at higher temperatures it would do a better job of finding the global minimum. But I'm happy with this approach - the annealing is just an improved local optimization that steps bast small barriers, and to find drastically different global solutions I have to seed the trial differently.

The new numbers :

file CB 1 CB 2 CB 3 Squish opt Squish ryg D3DX8 FastDXT
kodim01.bmp 8.447 8.3145 8.2678 8.2829 8.3553 8.9185 9.8466 9.9565
kodim02.bmp 5.6492 5.4759 5.2864 6.1079 6.2876 6.8011 7.4308 8.456
kodim03.bmp 4.7533 4.6776 4.6591 4.7869 4.9181 5.398 6.094 6.4839
kodim04.bmp 5.5234 5.4286 5.3967 5.6978 5.8116 6.3424 7.1032 7.3189
kodim05.bmp 9.7619 9.6171 9.5654 9.6493 9.7223 10.2522 11.273 12.0156
kodim06.bmp 7.2524 7.1395 7.1086 7.15 7.2171 7.6423 8.5195 8.6202
kodim07.bmp 5.7557 5.6602 5.634 5.784 5.8834 6.3181 7.2182 7.372
kodim08.bmp 10.3879 10.2587 10.2056 10.2401 10.3212 10.8534 11.8703 12.2668
kodim09.bmp 5.3242 5.2477 5.2247 5.2935 5.3659 5.7315 6.5332 6.6716
kodim10.bmp 5.2564 5.1818 5.1657 5.2478 5.3366 5.7089 6.4601 6.4592
kodim11.bmp 6.7614 6.6503 6.6139 6.731 6.8206 7.3099 8.1056 8.2492
kodim12.bmp 4.8159 4.747 4.7308 4.7968 4.8718 5.342 6.005 6.0748
kodim13.bmp 11.0183 10.8489 10.7894 10.8684 10.9428 11.6049 12.7139 12.9978
kodim14.bmp 8.4325 8.3105 8.2723 8.3062 8.3883 8.8656 9.896 10.8481
kodim15.bmp 5.6871 5.5891 5.548 5.8304 5.9525 6.3297 7.3085 7.4932
kodim16.bmp 5.1351 5.0439 5.0274 5.065 5.1629 5.5526 6.3361 6.1592
kodim17.bmp 5.5999 5.5146 5.4976 5.509 5.6127 6.0357 6.7395 6.8989
kodim18.bmp 8.1345 8.0103 7.9791 7.9924 8.0897 8.6925 9.5357 9.7857
kodim19.bmp 6.6903 6.5979 6.5645 6.5762 6.652 7.2684 7.9229 8.0096
kodim20.bmp 5.4532 5.3825 5.3582 5.4568 5.5303 5.9087 6.4878 6.8629
kodim21.bmp 7.2207 7.1046 7.0718 7.1351 7.2045 7.6764 8.4703 8.6508
kodim22.bmp 6.52 6.4191 6.3933 6.4348 6.5127 7.0705 8.0046 7.9488
kodim23.bmp 4.9599 4.8899 4.8722 4.9063 5.0098 5.3789 6.3057 6.888
kodim24.bmp 8.5761 8.4633 8.4226 8.4299 8.5274 8.9206 9.9389 10.5156
clegg.bmp 14.8934 14.8017 14.6102 14.9736 15.2566 15.7163 21.471 32.7192
FRYMIRE.bmp 7.4898 7.3461 6.0851 10.7105 12.541 12.681 16.7308 28.9283
LENA.bmp 7.1286 7.0286 6.9928 7.1432 7.2346 7.6053 8.742 9.5143
MONARCH.bmp 6.6617 6.5616 6.5281 6.5567 6.6292 7.0313 8.1053 8.6993
PEPPERS.bmp 6.0418 5.9523 5.8026 6.4036 6.5208 6.9006 8.1855 8.8893
SAIL.bmp 8.4339 8.3077 8.2665 8.3254 8.3903 8.9823 9.7838 10.5673
SERRANO.bmp 6.7347 6.2445 5.9454 6.3524 6.757 7.0722 9.0549 18.3631
TULIPS.bmp 7.6491 7.5406 7.5065 7.5805 7.656 8.0101 9.3817 10.5873
lena512ggg.bmp 4.8961 4.8395 4.8241 4.8426 4.915 5.1986 6.0059 5.5247
lena512pink.bmp 4.6736 4.5922 4.5767 4.5878 4.6726 5.0987 5.8064 5.838
lena512pink0g.bmp 3.7992 3.7523 3.7455 3.7572 3.8058 4.2756 5.0732 4.8933
linear_ramp1.BMP 1.5607 1.348 1.3513 1.4035 2.1243 2.0939 2.6317 3.981
linear_ramp2.BMP 1.5097 1.2772 1.2772 1.3427 2.3049 1.9306 2.5396 4.0756
orange_purple.BMP 2.9842 2.9048 2.9074 2.9125 3.0685 3.2684 4.4123 7.937
pink_green.BMP 3.2837 3.2041 3.2031 3.2121 3.3679 3.7949 4.5127 7.3481
sum : 250.8565 246.2747 243.2776 252.3828 259.7416 275.5826 318.5562 370.8691

11-21-08 - More Texture Compression Nonsense

I guess the DX11 Technical Preview was just publicly released a few weeks ago. Unfortunately it's semi-crippled and still doesn't have information about BC7. From what I gather though BC7 does seem to be pretty high quality.

There're multiple different issues here. There's providing data to the card in a way that's good for texture cache usage (DXT1 is a big winner here, especially on the RSX apparently). Another is keeping data small in memory so you can hold more. Obviously that's a bigger issue on the consoles than the PC, but you always want things smaller in memory if you can. Another issue is paging off disk quickly. Smaller data loads faster, though that's not a huge issue off hard disk, and even off DVD if you're doing something like SVT you are so dominated by seek times that file sizes may not be that big. Another issue is the size of your content for transmission, either on DVD or over the net; it's nice to make your game downloadable and reasonably small.

I guess that the Xenon guys sort of encourage you to use PCT to pack your textures tiny on disk or for transmission. PCT seems to be a variant of HD-photo. MCT is I guess a DXT-recompressor, but I can't find details on it. I'm not spilling any beans that you can't find at MS Research or that end users aren't figuring out or PS3 developers are hinting at.

The "Id way" I guess is storing data on disk in JPEG, paging that, decompressing, then recompressing to DXT5-YCoCg. That has the advantage of being reasonably small on disk, which is important if you have a huge unique textured world so you have N gigs of textures. But I wonder what kind of quality they really get from that. They're using two different lossy schemes, and when you compress through two different lossy schemes the errors usually add up. I would guess that the total error from running through both compressors puts them in the ballpark of DXT1. They're using 8 bits per pixel in memory, and presumably something like 1-2 bits per pixel on disk.

Instead you could just use DXT1 , at 4 bits per pixel in memory, and do a DXT1-recompressor, which I would guess could get around 2 bits per pixel on disk. DXT1-recompressed is lower quality than JPEG, but I wonder how it compares to JPEG-then-DXT5 ?

If I ignore the specifics of the Id method or the XBox360 for the moment, the general options are :

1. Compress textures to the desired hardware-ready format (DXT1 or DXT5 or whatever) with a high quality offline compressor. Store them in memory in this format. Recompress with a shuffler/delta algorithm to make them smaller for transmisssion on disk, but don't take them out of hardware-ready format. One disadvantage of this is that if you have to support multiple hardware-ready texture formats on the PC you have to transmit them all or convert.

2. Compress texture on disk with a lossy scheme that's very tight and has good RMSE/size performance. Decompress on load and then recompress to hardware-ready format (or not, if you have enough video memory). One advantage is you can look at the user's machine and decide what hardware-ready format to use. A disadvantange is the realtime DXT compressors are much lower quality and even though they are very fast the decompress-recompress is still a pretty big CPU load for paging.

3. Hybrid of the two : use some very small format for internet transmision or DVD storage, but when you install to HD do the decompress and recompress to hardware-ready format then. Still has the problem that you're running two different lossy compressors which is evil, but reduces CPU load during game runs.

I don't think that having smaller data on disk is much of a win for paging performance. If you're paging any kind of reasonable fine-grained unit you're just so dominated by seek time, and hard disk throughput is really very fast (and it's all async anyway so the time doesn't hurt). For example a 256 x 256 texture at 4 bits per pixel is 32k bytes which loads in 0.32 millis at 100 MB/sec. (BTW that's a handy rule of thumb - 100k takes 1 milli). So the real interesting goals are : A) be small in memory so that you can have a lot of goodies on screen, and B) be small for transmission to reduce install time, download time, number of DVDs, etc.

One idea I've been tossing around is the idea of using lossy compressed textures in memory as a texture cache to avoid hitting disk. For example, a 256 X 256 texture at 1 bit per pixel is only 8k bytes. If you wanted to page textures in that unit size you would be ridiculously dominated by seek time. Instead you could just hold a bunch of them in memory and decompress to "page" them into usable formats. The disk throughput is comparable to the decompresser throughput, but not having seek times means you can decompress them on demand. I'm not convinced that this is actually a useful thing though.

BTW I also found this Old Epic page about the NVidia DXT1 problem which happily is fixed but I guess there are still loads of those old chips around; this might still make using DXT1 exclusively on the PC impossible. The page also has some good sample textures to kill your compressor with.


11-20-08 - Pointless

I keep thinking about pointless things and I have to remind myself to stop.

One is texture compression, as I just mentioned. As long as the hardware needs DXTC, there's really not much I can or *should* do. I'd have to decompress then recompress to DXTC which is just pointless. Stop thinking about it.

Another is antialiased rasterization. Particularly in LRB I think you could do a pretty sweet exact antialiased rasterizer. But it's pretty pointless for 3d. It might be nice for 2d, for fonts and vector graphics and such, but in 3d the bad aliasing isn't even because of edges. The bad aliasing comes from various things - mainly failing to filter textures well when they aren't plain color textures (eg. mip-mapping normal maps), or aliasing due to lighting, and especially reflections (BTW specular is a type of reflection). Fresnel specular reflections are the worst because they're strong right at edges where you have very little precision, so they sparkle like crazy. To really handle these cases you need something like adaptive super-sampling (putting more samples where you need more detail). And to really do that right you need your shaders to return frequency information or at least derivatives.

... but I'm not thinking about that any more because it's pointless.

Oh and I should mention some other things about texture compression while they're in my head.

1. While the fixed block size is really bad for RMSE, it's not quite so bad for perceptual error. What it means is that you put "too many" bits in flat areas and send them almost perfectly. In noisy areas you don't have enough bits and send them very badly. But that's sort of what you want perceptually anyway. It's not actually ideal, but it's also not horrible.

2. One thing that none of these formats do is take advantage of mips. Assuming you always have a full mip chain, and you're doing trilinear filtering - then any time you need mip M the hardware already have mip M-1. Obviously you could do very good delta compression using the previous mip. For something like paging the situation is a little different, but you could certainly require that you have the 16x16 mip always in memory and at least use that for conditioning. I haven't really thought about this in detail, but obviously sending a whole mip chain without using the information between them is very wasteful.

11-20-08 - DXTC Part 3

So we know DXT1 is bad compared to a variable bitrate coder, but it is a small block fixed rate coder, which is pretty hard to deal with.

Small block inherently gives up a lot of coding capability, because you aren't allowed to use any cross-block information. H264 and HDPhoto are both small block (4x4) but both make very heavy use of cross-block information for either context coding, DC prediction, or lapping. Even JPEG is not a true small block coder because it has side information in the Huffman table that captures whole-image statistics.

Fixed bitrate blocks inherently gives up even more. It kills your ability to do any rate-distortion type of optimization. You can't allocate bits where they're needed. You might have images with big flat sections where you are actually wasting bits (you don't need all 64 bits for a 4x4 block), and then you have other areas that desperately need a few more bits, but you can't gived them to them.

So, what if we keep ourselves constrained to the idea of a fixed size block and try to use a better coder? What is the limit on how well you can do with those constraints? I thought I'd see if I could answer that reasonably quickly.

What I made is an 8x8 pixel fixed rate coder. It has zero side information, eg. no per-image tables. (it does have about 16 constants that are used for all images). Each block is coded to a fixed bit rate. Here I'm coding to 4 bits per pixel (the same as DXT1) so that I can compare RMSE directly, which is a 32 byte block for 8x8 pixels. It also works pretty well at 24 byte blocks (which is 1 bit per byte), or 64 for high quality, etc.

This 8x8 coder does a lossless YCoCg transform and a lossy DCT. Unlike JPEG, there is no quantization, no subsampling of chroma, no huffman table, etc. Coding is via an embedded bitplane coder with zerotree-style context prediction. I haven't spent much time on this, so the coding schemes are very rough. CodeTree and CodeLinear are two different coding techniques, and neither one is ideal.

Obviously going to 8x8 instead of 4x4 is a big advantage, but it seems like a more reasonable size for future hardware. To really improve the coding significantly on 4x4 blocks you would have to start using something like VQ with a codebook which hardware people don't like.

In the table below you'll see that CodeTree and CodeLinear generally provide a nice improvement on the natural images, about 20%. In general they're pretty close to half way between DXTC and the full image coder "cbwave". They have a different kind of perceptual artifact when they have errors - unlike DXTC which just make things really blocky, these get the halo ringing artifacts like JPEG (it's inherent in truncating DCT's).

The new coders do really badly on the weird synthetic images from bragzone, like clegg, frymire and serrano. I'd have to fix that if I really cared about these things.

One thing that is encouraging is that this coder does *very* well on the simple synthetic images, like the "linear_ramp" and the "pink_green" and "orange_purple". I think these synthetic images are a lot like what game lightmaps are like, and the new schemes are near lossless on them.

BTW image compression for paging is sort of a whole other issue. For one thing, a per-image table is perfectly reasonable to have, and you could work on something like 32x32 blocks. But more important is that in the short term you still need to provide the image in DXT1 to the graphics hardware. So you either just have to page the data in DXT1 already, or you have to recompress it, and as we've seen here the "real-time" DXT1 recompressors are not high enough quality for ubiquitous use.

ADDENDUM I forgot but you may have noticed the "ryg" in this table is also not the same as previous "ryg" - I fixed a few of the little bugs and you can see the improvement here. It's still not competitive, I think there may be more errors in the best fit optimization portion of the code, but he's got that code so optimized and obfuscated I can't see what's going on.

Oh, BTW the "CB" in this table is different than the previous table; the one here uses 4-means instead of 2-means, seeded from the pca direction, and then I try using each of the 4 means as endpoints. It's still not quite as good as Squish, but it's closer. It does beat Squish on some of the more degenerate images at the bottom, such as "linear_ramp". It also beats Squish on artificial tests like images that contain only 2 colors. For example on linear_ramp2 without optimization, 4-means gets 1.5617 while Squish gets 2.3049 ; most of that difference goes away after annealing though.

file Squish opt Squish CB opt CB ryg D3DX8 FastDXT cbwave KLTY CodeTree CodeLinear
kodim01.bmp 8.2808 8.3553 8.3035 8.516 8.9185 9.8466 9.9565 2.6068 5.757835 5.659023
kodim02.bmp 6.1086 6.2876 6.1159 6.25 6.8011 7.4308 8.456 1.6973 4.131007 4.144241
kodim03.bmp 4.7804 4.9181 4.7953 4.9309 5.398 6.094 6.4839 1.3405 3.369018 3.50115
kodim04.bmp 5.6913 5.8116 5.7201 5.8837 6.3424 7.1032 7.3189 1.8076 4.254454 4.174228
kodim05.bmp 9.6472 9.7223 9.6766 9.947 10.2522 11.273 12.0156 2.9739 6.556041 6.637885
kodim06.bmp 7.1472 7.2171 7.1596 7.3224 7.6423 8.5195 8.6202 2.0132 5.013081 4.858232
kodim07.bmp 5.7804 5.8834 5.7925 5.9546 6.3181 7.2182 7.372 1.4645 3.76087 3.79437
kodim08.bmp 10.2391 10.3212 10.2865 10.5499 10.8534 11.8703 12.2668 3.2936 6.861067 6.927792
kodim09.bmp 5.2871 5.3659 5.3026 5.4236 5.7315 6.5332 6.6716 1.6269 3.473094 3.479715
kodim10.bmp 5.2415 5.3366 5.2538 5.3737 5.7089 6.4601 6.4592 1.7459 3.545115 3.593297
kodim11.bmp 6.7261 6.8206 6.7409 6.9128 7.3099 8.1056 8.2492 1.8411 4.906141 4.744971
kodim12.bmp 4.7911 4.8718 4.799 4.9013 5.342 6.005 6.0748 1.5161 3.210518 3.231271
kodim13.bmp 10.8676 10.9428 10.9023 11.2169 11.6049 12.7139 12.9978 4.1355 9.044009 8.513297
kodim14.bmp 8.3034 8.3883 8.3199 8.5754 8.8656 9.896 10.8481 2.4191 6.212482 6.222196
kodim15.bmp 5.8233 5.9525 5.8432 6.0189 6.3297 7.3085 7.4932 1.6236 4.3074 4.441998
kodim16.bmp 5.0593 5.1629 5.0595 5.1637 5.5526 6.3361 6.1592 1.546 3.476671 3.333637
kodim17.bmp 5.5019 5.6127 5.51 5.6362 6.0357 6.7395 6.8989 1.7166 4.125859 4.007367
kodim18.bmp 7.9879 8.0897 8.0034 8.225 8.6925 9.5357 9.7857 2.9802 6.743892 6.376692
kodim19.bmp 6.5715 6.652 6.5961 6.7445 7.2684 7.9229 8.0096 2.0518 4.45822 4.353687
kodim20.bmp 5.4533 5.5303 5.47 5.5998 5.9087 6.4878 6.8629 1.5359 4.190565 4.154571
kodim21.bmp 7.1318 7.2045 7.1493 7.3203 7.6764 8.4703 8.6508 2.0659 5.269787 5.05321
kodim22.bmp 6.43 6.5127 6.4444 6.6185 7.0705 8.0046 7.9488 2.2574 5.217884 5.142252
kodim23.bmp 4.8995 5.0098 4.906 5.0156 5.3789 6.3057 6.888 1.3954 3.20464 3.378545
kodim24.bmp 8.4271 8.5274 8.442 8.7224 8.9206 9.9389 10.5156 2.4977 7.618436 7.389021
clegg.bmp 14.9733 15.2566 15.1516 16.0477 15.7163 21.471 32.7192 10.5426 21.797655 25.199576
FRYMIRE.bmp 10.7184 12.541 11.9631 12.9719 12.681 16.7308 28.9283 6.2394 21.543401 24.225852
LENA.bmp 7.138 7.2346 7.1691 7.3897 7.6053 8.742 9.5143 4.288 7.936599 8.465576
MONARCH.bmp 6.5526 6.6292 6.5809 6.7556 7.0313 8.1053 8.6993 1.6911 5.880189 5.915117
PEPPERS.bmp 6.3966 6.5208 6.436 6.6482 6.9006 8.1855 8.8893 2.3022 6.15367 6.228315
SAIL.bmp 8.3233 8.3903 8.3417 8.5561 8.9823 9.7838 10.5673 2.9003 6.642762 6.564393
SERRANO.bmp 6.3508 6.757 6.5572 6.991 7.0722 9.0549 18.3631 4.6489 13.516339 16.036401
TULIPS.bmp 7.5768 7.656 7.5959 7.8172 8.0101 9.3817 10.5873 2.2228 5.963537 6.384049
lena512ggg.bmp 4.8352 4.915 4.8261 4.877 5.1986 6.0059 5.5247 2.054319 2.276361
lena512pink.bmp 4.5786 4.6726 4.581 4.6863 5.0987 5.8064 5.838 3.653436 3.815336
lena512pink0g.bmp 3.7476 3.8058 3.7489 3.8034 4.2756 5.0732 4.8933 4.091045 5.587278
linear_ramp1.BMP 1.4045 2.1243 1.3741 1.6169 2.0939 2.6317 3.981 0.985808 0.984156
linear_ramp2.BMP 1.3377 2.3049 1.3021 1.5617 1.9306 2.5396 4.0756 0.628664 0.629358
orange_purple.BMP 2.9032 3.0685 2.9026 2.9653 3.2684 4.4123 7.937 1.471407 2.585087
pink_green.BMP 3.2058 3.3679 3.2 3.2569 3.7949 4.5127 7.3481 1.247967 1.726312


11-18-08 - DXTC Part 2

First of all, let's dispell the illusion that some people have that DXT1 is "pretty good". DXT1 is fucking awful. It compresses at 1.33333 bits per byte (4 bits per pixel). That's very large as far as image compressors are concerned. For typical images, around 4.0 bpb is true lossless, around 1.5 is perceptually lossless, and around 0.5 is "very good". In fact wavelet compressors can get as low as 0.1 bpb and acheive about the same quality as DXT1. Despite this I've heard smart people saying that "DXT1 is pretty good". Yes, it is a convenient fixed size block, and yes the decoder is extremely fast and simple, but as far as quality is concerned it is not even close. At 1.33333 it should be near lossless.

Here are some numbers on various DXT1 compressors. The numbers in the table are the RMSE (sqrt of the L2 error). The far right column is a wavelet compressor for comparison; it's not the best wavelet compressor in the world by a long shot, it's "cbwave" which is very old and which I designed for speed, not maximum quality. In any case it gives you an idea how far off DXT1 is. (BTW I always try to show results in RMSE because it is linear in pixel magnitude, unlike MSE or PSNR which is a very weird nonlinear scale). More discussion after the table...

file Squish opt Squish CB opt CB ryg D3DX8 FastDXT cbwave
kodim01.bmp 8.2808 8.3553 8.352 8.6924 9.374 9.8466 9.9565 2.6068
kodim02.bmp 6.1086 6.2876 6.1287 6.3025 7.52 7.4308 8.456 1.6973
kodim03.bmp 4.7804 4.9181 4.8312 5.0225 5.855 6.094 6.4839 1.3405
kodim04.bmp 5.6913 5.8116 5.7285 5.9394 6.9408 7.1032 7.3189 1.8076
kodim05.bmp 9.6472 9.7223 9.707 10.112 10.8934 11.273 12.0156 2.9739
kodim06.bmp 7.1472 7.2171 7.1777 7.44 8.1005 8.5195 8.6202 2.0132
kodim07.bmp 5.7804 5.8834 5.8379 6.0583 6.8153 7.2182 7.372 1.4645
kodim08.bmp 10.2391 10.3212 10.346 10.747 11.3992 11.8703 12.2668 3.2936
kodim09.bmp 5.2871 5.3659 5.3306 5.5234 5.9884 6.5332 6.6716 1.6269
kodim10.bmp 5.2415 5.3366 5.2777 5.4633 5.9377 6.4601 6.4592 1.7459
kodim11.bmp 6.7261 6.8206 6.7643 7.0216 7.8221 8.1056 8.2492 1.8411
kodim12.bmp 4.7911 4.8718 4.8204 4.9863 5.6651 6.005 6.0748 1.5161
kodim13.bmp 10.8676 10.9428 10.925 11.4237 12.402 12.7139 12.9978 4.1355
kodim14.bmp 8.3034 8.3883 8.3398 8.6722 9.4258 9.896 10.8481 2.4191
kodim15.bmp 5.8233 5.9525 5.8568 6.0862 6.6749 7.3085 7.4932 1.6236
kodim16.bmp 5.0593 5.1629 5.0863 5.2851 5.8093 6.3361 6.1592 1.546
kodim17.bmp 5.5019 5.6127 5.5313 5.7358 6.4975 6.7395 6.8989 1.7166
kodim18.bmp 7.9879 8.0897 8.0192 8.3716 9.7744 9.5357 9.7857 2.9802
kodim19.bmp 6.5715 6.652 6.6692 6.91 8.0128 7.9229 8.0096 2.0518
kodim20.bmp 5.4533 5.5303 5.4895 5.6864 6.3457 6.4878 6.8629 1.5359
kodim21.bmp 7.1318 7.2045 7.1724 7.4582 8.1637 8.4703 8.6508 2.0659
kodim22.bmp 6.43 6.5127 6.4644 6.7137 7.8264 8.0046 7.9488 2.2574
kodim23.bmp 4.8995 5.0098 4.9244 5.0906 5.6989 6.3057 6.888 1.3954
kodim24.bmp 8.4271 8.5274 8.4699 8.8564 9.3906 9.9389 10.5156 2.4977
clegg.bmp 14.9733 15.2566 15.1755 15.7168 16.3563 21.471 32.7192 10.5426
FRYMIRE.bmp 10.7184 12.541 12.132 12.8278 12.989 16.7308 28.9283 6.2394
LENA.bmp 7.138 7.2346 7.1763 7.4264 8.1203 8.742 9.5143 4.288
MONARCH.bmp 6.5526 6.6292 6.5949 6.846 7.5162 8.1053 8.6993 1.6911
PEPPERS.bmp 6.3966 6.5208 6.4557 6.677 7.3618 8.1855 8.8893 2.3022
SAIL.bmp 8.3233 8.3903 8.3598 8.6627 9.8685 9.7838 10.5673 2.9003
SERRANO.bmp 6.3508 6.757 6.8385 7.9064 7.5303 9.0549 18.3631 4.6489
TULIPS.bmp 7.5768 7.656 7.6146 7.8786 8.4084 9.3817 10.5873 2.2228

Back to comparing the DXT1 encoders. BTW the test set here is the Kodak image set plus the Waterloo Bragzone image set. The Kodak set is all photographs that are pretty noisy, and there's not a huge difference in the coders. The Bragzone image set has some synthetic images with things like gradients which are harder to compress well, and there you can really dramatically see the bad encoders fall apart. In particular if you look at the results on "clegg" and "frymire" and "serrano" you can see how bad the "FastDXT" coder is.

The "Squish" in the table is the iterative cluster fit with uniform weighting. All coders work on linear RGB error; and the MSE is mean per pixel not per component.

The "CB" encoder is a simple 2-means fit. I seed the means by doing the PCA to find the best fit line. I put a plane perpendicular to that line through the average and take all the points on each half to be the two clusters, average the cluster to seed the means, and then iteratively refine the means by reclustering. Once I have the 2-means I do a simple search to find the best 565 DXT end points to find the two means. There are 3 cases to try :

1. put c0 and c1 = the two means

2. put c2 and c3 = the two means (so c0 and c1 are pushed out)

3. make (c0,c2) straddle mean 1 and (c3,c1) straddle mean 2 - this is best for gaussian clusters around the mean

A 2-means like this is slightly better than doing a pca "range fit" like Simon's fast method or the "ryg" method. If the data was all Gaussian noise, they would be equivalent, but of course it's not. You often get blocks that have a bunch of pixels at the low end which are all exactly the same color ( for example, all perfect black), and then a spread of a bunch of junk at the high end (like some orange, some yellow, etc.). You want to put one end point exactly on perfectly black and put the other endpoint at the center of the cluster on the high end.

"CB opt" and "Squish opt" take the results of the CB and Squish compressors and then improve them by iterative search. Simon Brown on his page mentions something about doing greedy endpoint refinement but claims it "doesn't converge because the indeces change". That's silly, of course it converges.

To do a greedy search : try wiggling one or both end points in 565 space. Find new best index fit for the new end points. Measure new error. If the error is lower, take the step.

Of course that works and it does converge and it's pretty simple and improves the encoding after Squish.

In fact you can do even better by doing simulated annealing instead of a plain greedy search. We should know by now that any time we have a greedy search like this where we can measure the error and it can get stuck in local minima, we can improve it with something like simulated annealing. I use 256 steps of annealing with a sinusoidal decreasing temperature pattern. I randomly pick a way to wiggle the two end points (you need to consider wiggling both, not just single wiggles). I try the wiggle and measure the error delta. Negative errors (improvements) are always taken, positive errors are taken probabilitistically based on the temperature. This is what was done in the "opt" coders above.

Most of the time the annealing search just makes a small improvement, but once in a while (such as on "frymire" and "serrano") it really finds something remarkable and makes a big difference.

Simon's cluster fit alg is very sweet. I didn't really understand it at first from the description of the code, so I'm going to redescribe it for myself here.

The basic idea is you find an assignment of indices, and from that solve a best fit to give you the end points for those indices. If you think about all the colors in the block living in 3x3 color space, assigning indices is like giving labels to each point. All the points with the same label are a "cluster". Then each cluster is fit with either an end point or one of the interpolated points.

Once you know the clusters, putting the best two end points is a simple linear optimization. You can solve the least squares analytically, it's not like a matrix iteration least squares problem or anything.

So the trick is how to decide the indices. If you tried them all, it would be 4^16 ways, which is way too many. So what Simon does is create a linear ordering of the points using the PCA axis, then try all clusterings that can be created by planes perpendicular to that linear axis. That's equivalent to just trying all groupings of the indeces sorted by their dot along the PCA axis. That is, the clusters are

[0,i) , [i,j) [j,k) [k,16)

for all i,j,k
which is a manageable amount to search, and gives you most of the interesting clusterings that you care about. Something that might improve Squish is tring a few different axes and picking the best.

BTW this end point optimization is very approximate. One issue is that it assumes the best index for each point doesn't change. It also of course just uses real number arithmetic to make the 1/3 points, not the actual integer rounding that's done. Those factors are actually pretty big.

11-18-08 - DXTC

I've been doing a little work on DXT1 compression. The main reason I'm looking at it is because I want to do something *better* than DXT1, and before I do that, I want to make sure I'm doing the DXTC right.

For now let's gather prior art :

The only real decision the coder gets to make is what the two 565 endpoints are. The other details you just have to get right - reconstruct the palette interpolants right, and assign the 16 indices to the closest palette entries right. For the moment I don't care about speed, I just want to make sure the indices are actually the best choices.

All the papers by Ignacio and J.M.P. van Waveren are modern classics. One thing of note : the "simplified" index finder that JMP has in his original paper is wrong. On page 12 he says "Evaluating the above expression reveals that the sub expression ( !b3 & b4 ) can be omitted because it does not significantly contribute to the final result" - apparently that is not true, because when I use the "rewritten" index finder it produces bad indexes. His original version of the index finder bit twiddle is fine.

One thing I've found with the index finding is that the integer and nonlinear effects are pretty strong and any ideas you have about planar geometry fail if you assume that your 4 points are colinear (they are not actually). For example, an obvious idea is to do a dot product test to pick between the 4 points. First dot product with the middle separating plane, then the two planes between the next pairs of points. This doesn't work. Part of the problem is because of the bad rounding in the generation of the 1/3 point. The JMP method is actually comparing vs the true distances, so that part is good.

In more detail, an obvious idea is to do something like this :

int dot = (color - c0) * (c1 - c0);

int bmid = (dot > mid01);
int b02 = (dot > mid02);
int b31 = (dot > mid31);

// using the DXT1 palette order label 0,2,3,1

int threebit = (bmid<<2) + (b02<<1) + b31;

int index = remap[threebit];

// remap is an 8-entry table that remaps from 3 bits to 2 bits to give an index

That "works" just as much as the "ryg" code works or the "Extreme DXT" method of finding indexes work - which is to say that it doesn't work.

FastDXT appears to be an implementation of the id "Real Time DXT" paper.

Ericsson Texture Compression (ETC2) is similar to DXTC but different; this is a pretty good paper, there are some interesting ideas here like the T and H blocks. It gets slightly better quality in some cases. It's obvious that you can beat DXTC by having a base color and a log-scaled delta color, rather than two end points. The two 565 end points is wasteful; you could for example do a 777 base color and a 444 log scaled delta.

Tight Frame Normal Map Compression is similar to DXT5 (really to 3Dc) with some small improvement on normal maps.

Extreme DXT Compression by Peter Uliciansky is indeed super fast. However, it has some errors. The division method that he uses to assign the indeces is not correct. You can improve it by offsetting the end points correctly to put the division quantization boundaries in the right place, but even then it's still not exactly correct unless the diagonal of the color bbox is along {1,1,1}. The error from this is pretty small and he's going for max speed, so it's semi reasonable what he does. Note that using the full range of the bbox for the division but insetting it to make the dxt1 colors (as he does) improves the indeces slightly, but it's not actually the correct range scale.

In more detail :

To find indices for the DXTC colors c0 and c1

You should use

base = c0 + (c0 - c1)/6

range = (c1 - c0) * 4/3

index = (color - base) * 4 / range;


index = (color - base) * 3 / (c1 - c0);

not :

index = (color - c0) * 4 / (c1 - c0);

But even if you do that it's still wrong because the planes through color space do not actually separate the colors from their closest palette entry.

MSDN compressed format documentation now has the right rounding for the 1/3 points, but the old DirectX8 docs have the wrong rounding. The old docs said

color_2 = (2*color_0 + 1*color_1 + 1)/3
color_3 = (1*color_0 + 2*color_1 + 1)/3

Which is actually better, but is not what the standard or the hardware does.

Simon Brown's Squish is very good. His Cluster Fit idea is elegant and clever and produces quite good quality.

There's a thread on MollyRocket about DXTC with some convenient code from "ryg". It's nice simple code, but it has some errors. The way he finds the indices is the dot product method which is wrong. Also the way he computes the 1/3 and 2/3 colors for the palette using LerpRGB is wrong, it rounds differently that the hardware really does.

One thing that some simple code gets wrong is the way to quantize from 24 bit to 565. You can't just take the top bits, because the 565 values are not reconstructed to the middle of their bins. You need to correct for the offsetting. You can either use the true range, which is something like -4 to 259, or you can skew the bits in the quantizer. The "ryg" code has a perfect quantizer that's very neat in the "As16Bit" function. An example is that the value 250 should map to 30 in 5 bits, not 31.

BTW the YCoCg-DXT5 method is pretty good on current hardware, but I'm not really looking at it. It's obviously inherently wasteful, it's just a way of making use of the current hardware, but it's not really what you'd want to be doing. It's also very large, at around 2.6 bits per byte (8 bits per pixel).

More advanced topics to come.


11-13-08 - Links

Top Chef the show is pretty meh as a show; I'm a sucker for food shows, but god it's such a cheesy standard reality show with all the fake drama and manipulation. I have to watch it though so that I can understand the blogs . The blogs are reliably hillarious; Tom's is generally good, and then there will be other tasty ones by the likes of Tony Bourdain, Richard's is often good, Harold is good, and whoever the guest celebrity chef is sometimes writes a good one.

Simon Brown has a nice DXTC library, and some other good stuff on his page; good page on spherical harmonics for example.

Shawn Hargreaves has a good blog that's mainly on introduction-to-game-development kind of stuff, mostly with XNA. I don't know how many commercial people are actually using the whole XNA studio thing, but I guess I might have to make Oodle work with their "Content Pipeline" thing.

Wolfgang Engel's blog is pretty good. Apparently he made an iPhone game SDK.


11-11-08 - Poker Taxes

There's a story going around that the winner of the WSOP will be hit by a 73% tax rate in his home country of Denmark. Poker players roundly call it "ridiculous" and "awful" and say what's the point of winning blah blah.

I'm not sure exactly where I stand on the issue. I do think it's a bit retarded that gambling is taxed even higher than income in some cases. It should at most be taxed as high as stocks. I can see a certain argument that gambling shouldn't be taxed - I mean if you imagine two guys each put in $50 and flip a coin to see who gets the whole $100 - if you take 50% taxes off that, it means the winner only gets $50 ? (well, not exactly, but you get the point, kind of). Why should it be taxed at all, it's just a transfer of money? But by that logic you could argue for not taxing anything.

Anyway, that's not the point. The interesting thing is that all the people who are so upset about this "travesty of excessive taxation" are horrible poor low level gamblers who will never win anything and never be subject to these taxes, but they're still really pissed on behalf of the winner. It finally made me realize where that sentiment comes from. It's the same thing that makes all these morons play the lottery. They are not thinking in their own best interest. They are far better off when the rich pay high taxes, because these dumb fucks will never be rich. But they want to be able to dream. They've basically given up on actually improving their lives in a real way. They have given up logic and action, and they now only live in fantasy. Fantasies where they might win the WSOP Main Event and get rich, fantasies where they might win the lottery, fantasies where they can buy a house on a subprime mortgage and it will appreciate fantastically and they'll get rich. And they want the taxes to be low so that when they get all this money in their fantasy, they get to keep it all.


11-11-08 - REYES

Gritz' course on Renderman : To Infinity and Beyond is the best reference I've found on REYES. I'm still curious about the details of how to efficiently do the part from micropolygons to pixel colors.

The brute force way takes a micropolygon and puts a bound on it that cover its entire temporal sweep across the current frame. That is then binned into all the pixels it touches. In each pixel, 16 (or so) supersampling locations are checked with precomputed jittered sample locations and times. At each spot where a sample is found, the color & Z are added to a list for later blending. Note with fast moving objects and depth of field, it can get really inefficient, because micropolygons have to get tested in many pixels.

The Pixar Library is a nice collection of papers. There's a new paper I hadn't seen on point-based GI that's based on the Bunell disk method for realtime GI but enhanced for better quality. It seems extremely simple and the results look good. It's not super accurate in terms of solving the rendering equation with minimal error, but that's not really what we want anyway. Though actually now that I think about it it's just a variant on the ancient "render the world from each surfel's point of view" method for GI and doesn't really add much.

Anyway, back to REYES. Ignoring temporal sampling, the spatial sampling stuff can all be done with a normal rasterizer I'm pretty sure. The micropolygon subdivision and binning is just your fragment rasterizer (a rasterizer is just a device that creates micropolygons that are at most pixel size and makes a grid that's aligned with the screen grid). When you decide a fragment goes in a pel, you don't just stuff the color in the frame buffer, instead you stick it in the list to sample against the stochastic supersampled grid test just like REYES.

But it occurs to me that if you're building up a list of fragments in each pel like this, you may as well do analytic coverage than stochastic sampling. When you rasterize your triangles, compute the area of each pel that's covered. This can be done efficienctly incrementally just like a Wu antialiased line drawer for the case of only one triangle edge interesting a pel, if 2 or 3 edges intersect one pel it's more work.

Add the fragment to the pel list, consisting of {color, alpha, Z, area}. Once you have all these fragments sort them back-to-front and accumulate into the framebuffer. Fragments that have the same Z are first gathered together before being alpha-blended onto the current value in the frame buffer. That means that two triangles that share an edge act like a continuous surface and will correctly give you 100% area covers and will correctly not alpha blend with each other. This gives you exact anti-aliasing with alpha blending.

11-11-08 - Blah Blah

Dan's Data Review of the Dell 30 ; Dan's got a nice site, lots of good value.

iHologram anamorphic perspective demo is totally fucking rad. Found from Kokoromi

There's tons of videos on Youtube of realtime raytracers, voxel engines, and cool CUDA stuff. Galaxies collide with CUDA , Voxelstein , and a cool compilation .

11-11-08 - Mapped Drives

Is there a way to make "Disconnected Network Drives" try to connect from DOS or from code? I'm talking about when you "net use" to make a network path to a drive letter, but you reboot or something so it's disconnected. It does a sort of lazy connect, where if you go click on it in Explorer it will hook right up, but if you go try to dir it from a CLI before you explore to it, you can't get to it.

I'd like to make my CLI file copy utils do something like "if target is a disconnect network drive, wake it up first".

The answer seems to rely somewhere in the functions RealDriveType, IsNetDrive, and WNetRestoreConnection.

These are functions that the DOJ forced MS to expose and the documentation is total balls. This page has better info than MSDN :


I'm seeing some weird things with them though. For one thing, IsNetDrive is not a light query. It actually stalls and tries to wake up the drive right there, so most of the time all you have to do is call IsNetDrive and that actually will restore the connection. IsNetDrive can stall out for a very long time.

IsNetDrive also lies, as that page I linked details.

I haven't gotten WNetRestoreConnection to do anything useful. It always tells me "The local device name is already in use".

WNetRestoreConnection also pops up an interactive dialog box when it has errors which is fucking evil.

So ... in every case I've tried so far IsNetDrive actually refreshes the connection. I haven't yet found a need to actually call WNetRestoreConnection, which is good because it doesn't seem to work.

11-10-08 - Junk

Guy La Liberte is basically single handedly subsidizing the top online poker players. He's dumping 20-30 mil a year to something like 10-20 people. Sometimes I see how bad he is and think I should take a shot at the $500/$1000 NLHE games, but of course the issue is not beating Guy, it's beating the other guys who are feeding off Guy, like durrr or PA.

Hamburger has to be at 15% or its just gross. The "lean" stuff at 10% is dry and bland. The budget stuff is usually around 20% and that's too greasy. I could buy both and mix but that means I have to get an awful lot. I guess I might have to get into grinding my own.

One of the fun things about game development is that you only have to finish things 99% of the way. The last little bit of programming is really not fun. The initial getting it working to 90% is really fun, and then for games you just fix some bugs and polish a tiny bit - but you only have to make it work in the way it's used in the game, not really robustly. If it has some quirks under certain uses, that's fine, just don't use it that way.

Programmers all talk about completion percentages in a really distorted scale. I just did it in the last paragraph. I sort of have to do it, because everyone does. It's common parlance to say "oh yeah, I implemented self-balancing binary trees, I got it like 90% done in an hour". Of course that's nonsense. Any good experienced programmer knows that you've actually done maybe 10% of the total work needed to make it a finished functioning efficient robust piece of code. But it appears to be 90% done because hey you made a bunch of characters in a file that ends in "cpp" so you say 90%. Then when something is pretty solid we say 95% or 99% when in reality it's highly likely there will be a few more nasty issues which will take almost as much time as you've put in already, so you're only about 50% done.

When somebody tells you they're "x%" done, you should raise that to the 10th power. So "90% done" = "0.9^10" = 35% done. "99% done" = 90% done, etc.

I stumbled on this suggestion to compare floats using their int difference which counts the number of possible float values off they are. I wonder if this is actually a robust way to do float math with tolerances. Christer? ... actually I'm sure it's not. It does seem like an okay way to do the relative epsilon compare, but it is not a replacement for also having an absolute epsilon check.

And What Every Computer Scientist Should Know About Floating-Point Arithmetic , by David Goldberg. It's really long and old, but quite excellent. And actually the oldness is good because it means he talks about floating point very generally, since the old mainframes had lots of different weird implementations.

"The Life and Times of Tim" is horrifically not funny. It's like the love child of "Dr. Katz" and "Southpark" , which is like interpolating between boring and retarded.


11-10-08 - Bleck

I'm still really sick. I know for sure a doctor is not going to do a damn thing for me, but I'm starting to feel like I have to go just so that I'm trying to do something about this. I hate having problems and not doing anything about it.

I have all the symptoms of a bacterial infection, except that I can't find a specific site of infection, and my fever's not super high. The feeling really reminds me of when I had those infected spider bites though. Maybe I could prescribe myself some antibioitics. I'm super out of it, faint, weak, have a head ache, really achy muscles everywhere, like my eye muscles really hurt. My fever's like 99, but I normally run low, like 97.5 (btw oral temperature is so massively inaccurate, it's silly that doctors have these thresholds like 100.4 for antibiotics. your temperature can vary by 1 degree throughout the day; it varies massively based on whether you've been wrapped up in blankets, if you exercised, if you just ate hot food or drank cold water, etc. etc.).

I'm trying to make sure I know a hospital to go to in case I collapse. There are literally 5 hospitals on Capitol / First hill, but not a single one of them is public & subsidizied. WTF. Seattle public services really sucks complete balls. SF's got a subsidized public hospital that's completely free if you're poor and uninsured. Sure you have to sit next to the homeless crack heads in the waiting room, but the wait's not even that bad unless you go on a Friday night.

I realized yesterday that I've passed out an awful lot in my life. I passed out in Mammoth once because I drank too much beer and sat in the hot tub too long. That night I got up to go pee and just collapsed. No biggie. Then twice in SLO, once when I had a really bad cold and a high fever and we thought I might have pneumonia; I went to the hospital and they told me I was a baby wasting their time and to get the fuck out. Then another time with my infected spider bites. When I went to the hospital then they couldn't believe I'd let it go so long. Then again in San Francisco just from some kind of cold. I walked into the kitchen and felt really dizzy and was trying to steady myself and just suddenly passed out cold, I fell completely limp like a ton of fleshy bricks and hit my head on the stove. That was scary. Oh yeah - and one time in a Rugby game when I caught a kick and was running in the open field and took a tackle really badly. I was out cold and people gathered around to hold up fingers for me to count. And I got right back in the game because our team never had enough reserves and I stupidly wanted to show I was tough enough.

I'm a little confused about what to do about fevers now. Modern understanding is that most of the bad problems from infection is actually caused by the body's overreaction and excessive fever, which is what causes organ failure and tissue destruction and so on. On the other hand, modern understanding also says that you should not take fever reducers for normal illnesses, because the fever does help kill things off faster. And in fact some people say you should bundle up and try to get extra warm to help your body be warm and kill the infection. I guess as long as the fever is under 100 you should let it be.

BTW I just put together that sepsis = bacterial infection, and "septic tank" is not called "septic" referring to the nasty poo inside, but rather the fact that there's bacteria inside.


11-07-08 - Reviews

The Bird People in China - quite magical; a sort of allegory, in the style of the magical realism books; somewhere between an amazing adventure and a daydream. It does come off the rails a bit (the repeating of that song over and over is very tiresome), but overall I found it delightful and mesmerizing. Partly because the setting is just so ridiculously beautiful and other-worldly. It's by Miike who is now one of my absolute favorite directors (Last Life in the Universe, Gozu, Audition). Most of Miike's work seems to be shitty mass-market Yakuza flicks for the Japanese market, but he's also got some of this really weird surreal stuff that I love.

Spain : On the Road Again. Hello!? Gwyneth Paltrow? WTF is this bitch doing here? She's a boring drag, she has zero charisma, she knows nothing about food or Spain, she's really not very attractive either, they're fucking driving around in Mercedes and having retarded conversations about nothing. They're showing me Gwyneth Paltrow getting a fucking spa treatment at a fancy hotel, WTF!? Ugh. I like Batali and Bittman, and I am in love with Spain right now, and I love food travel shows, but please please just focus on the men and the food.

Fleet Foxes. I know I reviewed this before, but I want to write more. Fleet Foxes are often now smuggly described as what would've happened if the Beach Boys hung out in the woods. Yeah, that's sort of true, but the songs are far more bare and raw and emotional than the Beach Boys. Yes, the four part harmony is similar and just as magical, but Fleet Foxes uses it better, because Robin's voice on its own is so beautiful and earnest, then when the four part harmony kicks in it's like an angel's chorus, it's literally a dopamine explosion inside your head as it hits some primal happy nerves. I find the Fleet Foxes album shares a lot with the Arcade Fire actually in terms of the raw exuberance, the ecstatic joy of music. The songs tell of all the moods of the country. You feel like you're out in the middle of nowhere living in an ancient pastoral life. The hope of the sunrise on a country morn. The lazy freedom of strolling through the fields on a hot summer day. The insular closed in togetherness of long country nights. The stark loneliness of snowed in winter. The darkness and mystery of the deep woods. The occasional visit to far away friends, the connection to family through the generations, and all the quiet time for contemplation in the rain.

A Streetcar Named Desire - Finally saw the movie. I've never read the play or anything, but I've seen the Simpsons musical spoof of Streetcar and so many other references that I knew all the lines and pretty much exactly knew the story already. It's actually quite amazing. The thing is, the direction is horrible, it's so cheesy and over-wrought, as horrible plays tend to be, and all the acting is just atrocious - *except* for Brando. Everyone is stuck in this ridiculous theatrical style (and early movie style) of over-acting, and Brando is just there, totally casual. His first scene he just barely mumbles out the words as if he couldn't be bothered to enunciate. He's so natural, and the contrast of how bad everything else is just makes it even more powerful. Brando's like a quivering ball of masculine energy, it's a perfect role for him, but the movie is only mediocre. Fortunately he did make one great movie in his virile youth before he turned into a fat old weirdo - "On the Waterfront" is the rare classic that holds up to modern viewing.

old rants