string table format?
Moderator: Paul Siramy
-
- Posts: 97
- Joined: Sun Jun 30, 2002 12:11 am
- Location: Colorado, USA
string table format?
Does someone know where I can find the file format of the .tbl files (Strings)? Or perhaps the source of a program which reads them?
Thanks
MAKF1127
Thanks
MAKF1127
-
- Dominion
- Posts: 6921
- Joined: Mon May 27, 2002 7:19 am
- Location: Orsay
just examine the perl script that is available on the keep.
I also have a C++ class that handle it quite well.
I also have a C++ class that handle it quite well.
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum
Shadow Empire (coming soon) | forum
-
- Posts: 97
- Joined: Sun Jun 30, 2002 12:11 am
- Location: Colorado, USA
Actually, you get a description of the format from the comments of the perl script, but the actual hash function is in perl.
Additionally, there are two word values at offset +02 and +04 that contain the number of string records in the file. I don't know why two values, and the second value is never less than the first as far as I have seen. Perhaps the first value is the number of strings and the second is the maximum number of strings the file has ever had.Ondo and Mephansteras" wrote:# A intro to the string.tbl format.
#
# There are four main sections to the string.tbl file.
# First, the header. This is 21 bytes long.
# Second, an array with two bytes per entry, that gives an index into the next table. This allows lookups of strings by number.
# Third, a hash array, with 17 bytes per entry, which has the pointers to the key and value strings, and has the strings sorted basically by hash value. This allows lookups of strings by key.
# Fourth, the actual strings themselves.
Do the right thing. It will gratify some people and astonish the rest.
~ Mark Twain
Run Diablo II in any version for mods: tutorial
The Terms of Service!! Know them, abide by them, and enjoy the forums at peace.
The Beginner's Guide v1.4: (MS Word | PDF) || Mod Running Scripts || TFW: Awakening
~ Mark Twain
Run Diablo II in any version for mods: tutorial
The Terms of Service!! Know them, abide by them, and enjoy the forums at peace.
The Beginner's Guide v1.4: (MS Word | PDF) || Mod Running Scripts || TFW: Awakening
-
- Posts: 97
- Joined: Sun Jun 30, 2002 12:11 am
- Location: Colorado, USA
-
- Paladin
- Posts: 160
- Joined: Mon Oct 21, 2002 1:13 pm
- Location: Kansas
Here is the link to the Keep's Tutorial on String Tables:
http://dynamic2.gamespy.com/~phrozenkee ... uettar.php
Hope this helps![/url]
http://dynamic2.gamespy.com/~phrozenkee ... uettar.php
Hope this helps![/url]
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Re: string table format?
Here is a file I once created (in C although I might have used some C++ extensions) to read stringtable files in a larager program I wrote. I had some customized file open/close functions and also a memory allocation function so that is why I have double lines of code for that one of which is marked as comment (edit: this version I found here seem to be changed to include the memory allocation and such so I can give away the file ).
I never got arround to implement everything I wanted so a few things are half done only. I also have minimal comments but it should hopefully be OK. The good thing is that it do include functions to actualy pars the string tables in all the ways the game can and the file format should be relatively easy to figure out from the variables at the start.
Enjoy!
Edit: This seems to be a slightly old version of the file I have. Not sure if it had any errors though. I will check for the most resent version once I get to the computer that has those files.
In case anyone wonder, the header file looks like this:
I never got arround to implement everything I wanted so a few things are half done only. I also have minimal comments but it should hopefully be OK. The good thing is that it do include functions to actualy pars the string tables in all the ways the game can and the file format should be relatively easy to figure out from the variables at the start.
Enjoy!
Edit: This seems to be a slightly old version of the file I have. Not sure if it had any errors though. I will check for the most resent version once I get to the computer that has those files.
Code: Select all
// stringtable.cpp
//
// Created by Pedro Faria (Jarulf).
//
// Many thanks to Peter Hatch (Ondo) for information
// about the structure and algorithms regarding
// the file "string.tbl".
#include <stdio.h> // remove if in stdafx.h
#include <stdlib.h> // remove if in stdafx.h
#include <string.h> // remove if in stdafx.h
#include "stdafx.h"
enum {
// File info. Some are not used by program
// Size info of various sections
HeaderSize = 0x15,
ElementSize = 0x02,
NodeSize = 0x11,
// Header info location
CRCOffset = 0x00, // word
NumElementsOffset = 0x02, // word
HashTableSizeOffset = 0x04, // dword
VersionOffset = 0x08, // byte (always 0)
StringStartOffset = 0x09, // dword
NumLoopsOffset = 0x0D, // dword
FileSizeOffset = 0x11, // dword
// Element info location
NodeNumOffset = 0x00, // word
// Node info location
ActiveOffset = 0x00, // byte
IdxNbrOffset = 0x01, // word
HashValueOffset = 0x03, // dword
IdxStringOffset = 0x07, // dword
NameStringOffset = 0x0B, // dword
NameLenOffset = 0x0F, // word
// KeyNums
StringKeyNum = 0,
PatchStringKeyNum = 10000,
ExpansionStringKeyNum = 20000
};
static bool IsInit = false;
static char *ptStringTable = NULL;
static char *ptExpansionStringTable = NULL;
static char *ptPatchStringTable = NULL;
static char strStringFilename[] = "string.tbl";
static char strExpansionStringFilename[] = "expansionstring.tbl";
static char strPatchStringFilename[] = "patchstring.tbl";
static char strNameNotFound[] = "Unknown name";
static char strNull[] = "";
////////////////////
// Memory allocation
////////////////////
// just here to make compilation possible
// would be in other file normally
static void allocateMemory(void *ptMemx, int sizeMem)
{
void **ptMem = (void **)ptMemx;
if ((*ptMem = malloc(sizeMem)) == NULL)
{
printf("Error: Can't allocate %d bytes of memory, program terminated.\n", sizeMem);
exit(0);
}
} // allocateMemory
void deallocateMemory(void *ptMemx)
{
void **ptMem = (void **)ptMemx;
if (*ptMem != NULL)
{
free(*ptMem);
*ptMem=NULL;
}
} // deallocateMemory
////////////////////
// Utility functions
////////////////////
static unsigned short getNumElements(char *ptTable)
{
return *(unsigned short *) (ptTable + NumElementsOffset);
} // getNumElements
static int getHashTableSize(char *ptTable)
{
return *(int *) (ptTable + HashTableSizeOffset);
} // getHashTableSize
static int getNumLoops(char *ptTable)
{
return *(int *) (ptTable + NumLoopsOffset);
} // getNumLoops
static char *getptStringStart(char *ptTable)
{
return ptTable + HeaderSize + ElementSize*getNumElements(ptTable) + NodeSize*getHashTableSize(ptTable);
} // getptStringStart
static char *getptStringEnd(char *ptTable)
{
return ptTable + (*(unsigned int *)(ptTable + FileSizeOffset));
} // getptStringEnd
static char *getptFirstNode(char *ptTable)
{
return ptTable + HeaderSize + ElementSize*getNumElements(ptTable);
} // getptFirstNode
static int getNodeNum(char *ptElement)
{
return *(unsigned short *) (ptElement + NodeNumOffset);
} // getNodeNum
static int getIdxNum(char *ptNode)
{
return (*(int *)(ptNode + IdxNbrOffset));
} // getIdxNum
static char *getptIdxString(char *ptTable, char *ptNode)
{
return ptTable + (*(int *)(ptNode + IdxStringOffset));
} // getptIdxString
static char *getptNameString(char *ptTable, char *ptNode)
{
return ptTable + (*(int *)(ptNode + NameStringOffset));
} // getptNameString
static unsigned int getFileSize(char *ptHeader)
{
return *(unsigned int *)(ptHeader + FileSizeOffset);
} // getFileSize
////////////////
// CRC functions
////////////////
static int calcCRC(unsigned char *ptStart, unsigned char *ptEnd)
{
unsigned char *ptCur;
unsigned short CRCValue;
unsigned short CRCTableEntry;
static const unsigned short CRCTable[256] = {
0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50A5, 0x60C6, 0x70E7, 0x8108, 0x9129, 0xA14A, 0xB16B, 0xC18C, 0xD1AD, 0xE1CE, 0xF1EF,
0x1231, 0x0210, 0x3273, 0x2252, 0x52B5, 0x4294, 0x72F7, 0x62D6, 0x9339, 0x8318, 0xB37B, 0xA35A, 0xD3BD, 0xC39C, 0xF3FF, 0xE3DE,
0x2462, 0x3443, 0x0420, 0x1401, 0x64E6, 0x74C7, 0x44A4, 0x5485, 0xA56A, 0xB54B, 0x8528, 0x9509, 0xE5EE, 0xF5CF, 0xC5AC, 0xD58D,
0x3653, 0x2672, 0x1611, 0x0630, 0x76D7, 0x66F6, 0x5695, 0x46B4, 0xB75B, 0xA77A, 0x9719, 0x8738, 0xF7DF, 0xE7FE, 0xD79D, 0xC7BC,
0x48C4, 0x58E5, 0x6886, 0x78A7, 0x0840, 0x1861, 0x2802, 0x3823, 0xC9CC, 0xD9ED, 0xE98E, 0xF9AF, 0x8948, 0x9969, 0xA90A, 0xB92B,
0x5AF5, 0x4AD4, 0x7AB7, 0x6A96, 0x1A71, 0x0A50, 0x3A33, 0x2A12, 0xDBFD, 0xCBDC, 0xFBBF, 0xEB9E, 0x9B79, 0x8B58, 0xBB3B, 0xAB1A,
0x6CA6, 0x7C87, 0x4CE4, 0x5CC5, 0x2C22, 0x3C03, 0x0C60, 0x1C41, 0xEDAE, 0xFD8F, 0xCDEC, 0xDDCD, 0xAD2A, 0xBD0B, 0x8D68, 0x9D49,
0x7E97, 0x6EB6, 0x5ED5, 0x4EF4, 0x3E13, 0x2E32, 0x1E51, 0x0E70, 0xFF9F, 0xEFBE, 0xDFDD, 0xCFFC, 0xBF1B, 0xAF3A, 0x9F59, 0x8F78,
0x9188, 0x81A9, 0xB1CA, 0xA1EB, 0xD10C, 0xC12D, 0xF14E, 0xE16F, 0x1080, 0x00A1, 0x30C2, 0x20E3, 0x5004, 0x4025, 0x7046, 0x6067,
0x83B9, 0x9398, 0xA3FB, 0xB3DA, 0xC33D, 0xD31C, 0xE37F, 0xF35E, 0x02B1, 0x1290, 0x22F3, 0x32D2, 0x4235, 0x5214, 0x6277, 0x7256,
0xB5EA, 0xA5CB, 0x95A8, 0x8589, 0xF56E, 0xE54F, 0xD52C, 0xC50D, 0x34E2, 0x24C3, 0x14A0, 0x0481, 0x7466, 0x6447, 0x5424, 0x4405,
0xA7DB, 0xB7FA, 0x8799, 0x97B8, 0xE75F, 0xF77E, 0xC71D, 0xD73C, 0x26D3, 0x36F2, 0x0691, 0x16B0, 0x6657, 0x7676, 0x4615, 0x5634,
0xD94C, 0xC96D, 0xF90E, 0xE92F, 0x99C8, 0x89E9, 0xB98A, 0xA9AB, 0x5844, 0x4865, 0x7806, 0x6827, 0x18C0, 0x08E1, 0x3882, 0x28A3,
0xCB7D, 0xDB5C, 0xEB3F, 0xFB1E, 0x8BF9, 0x9BD8, 0xABBB, 0xBB9A, 0x4A75, 0x5A54, 0x6A37, 0x7A16, 0x0AF1, 0x1AD0, 0x2AB3, 0x3A92,
0xFD2E, 0xED0F, 0xDD6C, 0xCD4D, 0xBDAA, 0xAD8B, 0x9DE8, 0x8DC9, 0x7C26, 0x6C07, 0x5C64, 0x4C45, 0x3CA2, 0x2C83, 0x1CE0, 0x0CC1,
0xEF1F, 0xFF3E, 0xCF5D, 0xDF7C, 0xAF9B, 0xBFBA, 0x8FD9, 0x9FF8, 0x6E17, 0x7E36, 0x4E55, 0x5E74, 0x2E93, 0x3EB2, 0x0ED1, 0x1EF0};
ptCur = ptStart;
CRCValue = 0xFFFF;
while(ptCur < ptEnd)
{
CRCTableEntry = CRCValue / 0x0100;
CRCTableEntry ^= (unsigned short)(*ptCur);
CRCValue &= 0x000000FF;
CRCValue *= 0x00000100;
CRCValue ^= CRCTable[CRCTableEntry];
ptCur++;
}
return CRCValue;
} // calcCRC
static int getCRC(char *ptTable)
{
char *ptStart;
char *ptEnd;
if(ptTable==NULL)
return -1;
ptStart = getptStringStart(ptTable);
ptEnd = getptStringEnd(ptTable);
return calcCRC((unsigned char *)ptStart,(unsigned char *)ptEnd);
} // getCRC
static bool setCRC(char *ptTable)
{
int CRCValue;
if(ptTable==NULL)
return false;
if((CRCValue=getCRC(ptTable))!=-1)
{
(*(unsigned short *)(ptTable + CRCOffset)) = (unsigned short)CRCValue;
return true;
}
else
return false;
} // setCRC
/////////////////
// Hash functions
/////////////////
static int getHash(char *ptKeyString, int HashTableSize)
{
char charValue;
unsigned int hashValue;
char *ptKeyStringChar;
hashValue = 0;
ptKeyStringChar = ptKeyString;
while ((charValue = *ptKeyStringChar++) != '\0')
{
hashValue *= 0x10;
hashValue += charValue;
if ((hashValue & 0xF0000000) != 0)
{
unsigned int tempValue = hashValue & 0xF0000000;
tempValue /= 0x01000000;
hashValue &= 0x0FFFFFFF;
hashValue ^= tempValue;
}
}
return hashValue % HashTableSize;
} // getHash
///////////////////////////////////
// Internal string search functions
///////////////////////////////////
static int getString(char *ptTable, char *ptKeyString, char **ptString)
{
int HashTableSize;
int NumLoops;
char *ptFirstNode;
char *ptNode;
int HashValue;
int Loop;
char *ptIdxString;
HashTableSize = getHashTableSize(ptTable);
NumLoops = getNumLoops(ptTable);
ptFirstNode = getptFirstNode(ptTable);
HashValue = getHash(ptKeyString, HashTableSize);
Loop = 0;
while (Loop++ < NumLoops)
{
ptNode = ptFirstNode + NodeSize*HashValue;
if (*ptNode + ActiveOffset == 1)
{
ptIdxString = getptIdxString(ptTable,ptNode);
if (strcmp(ptIdxString, ptKeyString) == 0)
{
*ptString = getptNameString(ptTable,ptNode);
return getIdxNum(ptNode);
}
}
HashValue++;
HashValue %= HashTableSize;
}
return -1;
} // getString
static char *getStringNum(char *ptTable, int KeyNum)
{
char *ptFirstNode;
char *ptNode;
char *ptElement;
int NodeNum;
ptFirstNode = getptFirstNode(ptTable);
ptElement = ptTable + HeaderSize + ElementSize*KeyNum;
NodeNum = getNodeNum(ptElement);
ptNode = ptFirstNode + NodeSize*NodeNum;
if (*ptNode + ActiveOffset == 1)
{
return getptNameString(ptTable,ptNode);
}
return NULL;
} // getStringNum
///////////////////////////////////
// Exported string search functions
///////////////////////////////////
int getNumStringByName(char *ptKeyString, char **ptString)
{
int IdxNbr;
if ((!IsInit) || (ptKeyString == NULL))
{
*ptString = NULL;
return -1;
}
if (ptPatchStringTable != NULL)
{
if ((IdxNbr = getString(ptPatchStringTable, ptKeyString, ptString)) != -1)
// KeyString found in patchstring.tbl
return IdxNbr + PatchStringKeyNum;
}
if (ptExpansionStringTable != NULL)
{
if ((IdxNbr = getString(ptExpansionStringTable, ptKeyString, ptString)) != -1)
// KeyString found in expansionstring.tbl
return IdxNbr + ExpansionStringKeyNum;
}
if (ptStringTable != NULL)
{
if ((IdxNbr = getString(ptStringTable, ptKeyString, ptString)) != -1)
// KeyString found in string.tbl
return IdxNbr + StringKeyNum;
}
// KeyString was not found
*ptString = strNameNotFound;
return -1;
} // getNumStringByName
char *getStringByName(char *ptKeyString)
{
char *ptString = NULL;
getNumStringByName(ptKeyString, &ptString);
return ptString;
} // getStringByName
char *getStringByNum(int KeyNum)
{
char *ptString = NULL;
if (!IsInit)
return NULL;
if (KeyNum >= ExpansionStringKeyNum)
{
if(ptExpansionStringTable != NULL)
{
if ((ptString = getStringNum(ptExpansionStringTable, KeyNum - ExpansionStringKeyNum)) != NULL)
// KeyNum found in expansionstring.tbl
return ptString;
}
}
else if (KeyNum >= PatchStringKeyNum)
{
if (ptPatchStringTable != NULL)
{
if ((ptString = getStringNum(ptPatchStringTable, KeyNum - PatchStringKeyNum)) != NULL)
// KeyNum found in patchstring.tbl
return ptString;
}
}
else
{
if (ptStringTable != NULL)
{
if ((ptString = getStringNum(ptStringTable, KeyNum - StringKeyNum)) != NULL)
// KeyNum found in string.tbl
return ptString;
}
}
// KeyNum was not found
return strNameNotFound;
} // getStringByNum
/////////////////
// initialization
/////////////////
static bool initTable(char **ptTable, char ptFileName[])
{
FILE *Source;
char Header[HeaderSize];
unsigned int FileSize;
if ((Source=fopen(ptFileName, "rb")) == NULL)
// if ((Source=fileopen(ptFileName, "rb")) == NULL)
return false;
bool IsOK = false;
if (fread(Header, sizeof(char), sizeof(Header), Source) == sizeof(Header))
{
FileSize = getFileSize(Header);
allocateMemory(ptTable, FileSize);
rewind(Source);
if (fread(*ptTable, sizeof(char), FileSize, Source) == FileSize)
IsOK = true;
else
{
free(*ptTable);
*ptTable = NULL;
}
}
fclose(Source);
// fileclose(Source);
return IsOK;
} // initTable
static writeResult(FILE *WriteDestination, char *strText)
{
if (WriteDestination != NULL)
fprintf(WriteDestination,"Found: %s\n", strText);
} // writeResult
bool initStringTables(FILE *WriteDestination)
{
if (!IsInit)
{
if (initTable(&ptStringTable, strStringFilename))
writeResult(WriteDestination, strStringFilename);
if (initTable(&ptExpansionStringTable, strExpansionStringFilename))
writeResult(WriteDestination, strExpansionStringFilename);
if (initTable(&ptPatchStringTable, strPatchStringFilename))
writeResult(WriteDestination, strPatchStringFilename);
IsInit = true;
}
return IsInit;
} // initStringTables
bool closeTable(char *ptTable)
{
if(ptTable!=NULL)
deallocateMemory(&ptTable);
return true;
} // closeTable
bool closeStringTables(void)
{
if (IsInit)
{
closeTable(ptStringTable);
closeTable(ptExpansionStringTable);
closeTable(ptPatchStringTable);
IsInit = false;
}
return true;
} // closeStringTables
////////////////
// testing stuff
////////////////
void writefile(char filename[], char *ptTable)
{
FILE *target;
if ((target=fileopen(filename,"w"))==NULL)
return;
unsigned short NumElements;
int HashTableSize;
int NumLoops;
char *ptFirstNode;
char *ptNode;
char *ptElement;
int NodeNum;
int idx;
NumElements = getNumElements(ptTable);
HashTableSize = getHashTableSize(ptTable);
NumLoops = getNumLoops(ptTable);
ptFirstNode = getptFirstNode(ptTable);
fprintf(target,"%s\n",filename);
fprintf(target,"Elements: %d, Hashs: %d, Loops: %d\n",NumElements,HashTableSize,NumLoops);
fprintf(target," Num EIdx Act HEIdx Hash Len String\n");
for(idx=0;idx<HashTableSize;idx++)
{
ptElement = ptTable + HeaderSize + ElementSize*idx;
NodeNum = getNodeNum(ptElement);
ptNode = ptFirstNode + NodeSize*idx;
fprintf(target,"%5d",idx);
if(idx<NumElements)
fprintf(target," %5d",NodeNum);
else
fprintf(target," ");
fprintf(target," %1d %5d %5d %5d",*ptNode + ActiveOffset,(*(unsigned short *)(ptNode + IdxNbrOffset)),(*(int *)(ptNode + HashValueOffset)),(*(unsigned short *)(ptNode + NameLenOffset)));
fprintf(target," %-25s", getptIdxString(ptTable,ptNode));
fprintf(target," %-80s", getptNameString(ptTable,ptNode));
fprintf(target,"\n");
}
fileclose(target);
} // writefile
void teststringtable(void)
{
if(initStringTables(stdout))
{
getCRC(ptStringTable);
getCRC(ptPatchStringTable);
getCRC(ptExpansionStringTable);
writefile("infostring.txt", ptStringTable);
writefile("infopstring.txt", ptPatchStringTable);
writefile("infoestring.txt", ptExpansionStringTable);
}
} // teststringtable
Code: Select all
// stringtable.h
extern int getNumStringByName(char *ptKeyString, char *ptString);
extern char *getStringByName(char *ptKeyString);
extern char *getStringByNum(int KeyNum);
extern bool initStringTables(FILE *WriteDestination);
extern bool closeStringTables(void);
extern void test(void);
Last edited by Jarulf on Mon Feb 16, 2004 12:55 am, edited 8 times in total.
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Myhrginoc" wrote:
Additionally, there are two word values at offset +02 and +04 that contain the number of string records in the file. I don't know why two values, and the second value is never less than the first as far as I have seen. Perhaps the first value is the number of strings and the second is the maximum number of strings the file has ever had.
Actually the value at +02 holds the number of "elements" in the file, that is in the part of the file were you look up a string by number. The value at +04 holds the hash table size, that is the number of entries in it. Those two does not nessecarilly have to be the same. The reason the second is at times larger is that the has table in some string table files holds empty entries while the element table never seem to do that.
Also note that the game code never uses the version number for anything. Probably since Blizzard has never updated it and hence all string tables are version "0".
-
- Dominion
- Posts: 6921
- Joined: Mon May 27, 2002 7:19 am
- Location: Orsay
Re: string table format?
Just a question.
My tbl reading code is more or less like yours Jarulf but i get the worst performance i ever add for loading it ( takes somes seconds for loading string.tbl)
What's reading time did you get with this one ?
My tbl reading code is more or less like yours Jarulf but i get the worst performance i ever add for loading it ( takes somes seconds for loading string.tbl)
What's reading time did you get with this one ?
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum
Shadow Empire (coming soon) | forum
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Re: string table format?
Ehh you mean reading the file into memory? I would say instantly. I use the standard C function to read a file(fread) and read it in two steps. First the header to find the file size and then the reast of the file. never clocked it or anything. I have the .tbl files extracted and in some folder on the hard disk.Joel" wrote:Just a question.
My tbl reading code is more or less like yours Jarulf but i get the worst performance i ever add for loading it ( takes somes seconds for loading string.tbl)
What's reading time did you get with this one ?
-
- Dominion
- Posts: 6921
- Joined: Mon May 27, 2002 7:19 am
- Location: Orsay
Re: string table format?
ok ... i've just try your code and it seems that the only thing that change between my code & yours is that i need to recopy the string into string objects for further editing, so there is a lot of new/delete in nested loops ...
quite bad
will rework it
quite bad
will rework it
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum
Shadow Empire (coming soon) | forum
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Re: string table format?
Ahh yeah,, that would severly hit speed performance. Of course, it might not matter much if it is just a one time initialization but if you are like me, you would hate it and recode it to be fasterJoel" wrote:ok ... i've just try your code and it seems that the only thing that change between my code & yours is that i need to recopy the string into string objects for further editing, so there is a lot of new/delete in nested loops ...
quite bad
will rework it
-
- Dominion
- Posts: 6921
- Joined: Mon May 27, 2002 7:19 am
- Location: Orsay
Re: string table format?
Paul just gave me an idea :
After opening the tbl, I DON'T read it as a whole.
I use file-mapping to get the key and display them into a listbox.
Then i only allocate string memory for string that are currently edited.
And it's unlikely possible for a single user to edit ALL strings at once ...
Just by using file-mapping i've get to 13.12 sec for loading a tbl to 0.15s ...
No comment
After opening the tbl, I DON'T read it as a whole.
I use file-mapping to get the key and display them into a listbox.
Then i only allocate string memory for string that are currently edited.
And it's unlikely possible for a single user to edit ALL strings at once ...
Just by using file-mapping i've get to 13.12 sec for loading a tbl to 0.15s ...
No comment
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum
Shadow Empire (coming soon) | forum
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Re: string table format?
Why not actually read the file all at once into some kind of buffer. Then move the string into your string memory when editing. That seems better (using some memory moves as needed) instead of having the file opened all the time (or opening/closing it) accessing it each time you mess with a string.Joel" wrote:Paul just gave me an idea :
After opening the tbl, I DON'T read it as a whole.
I use file-mapping to get the key and display them into a listbox.
Then i only allocate string memory for string that are currently edited.
And it's unlikely possible for a single user to edit ALL strings at once ...
Just by using file-mapping i've get to 13.12 sec for loading a tbl to 0.15s ...
No comment
-
- Paladin
- Posts: 150
- Joined: Thu Jul 25, 2002 4:09 am
- Location: Indianapolis, IN, USA
Re: string table format?
Er, you're not talking about just reading strings from the file as you need them are you? Because you can't edit the strings in place (with offsets and lengths and CRCs and all). You are just prolonging the inevitable, since you will have to completely recreate the file when it is saved.Joel" wrote:After opening the tbl, I DON'T read it as a whole.
I use file-mapping to get the key and display them into a listbox.
Then i only allocate string memory for string that are currently edited.
And it's unlikely possible for a single user to edit ALL strings at once ...
What someone else suggested of reading the whole file into memory first would actually work though. I changed a ROM editing program I wrote to do just that and saw a monumental speed increase (order of magnitude).
---Evil Peer
-
- Dominion
- Posts: 6921
- Joined: Mon May 27, 2002 7:19 am
- Location: Orsay
Re: string table format?
Actually, if I want it to do it that way I can ...
I've packed some low level Win32 File mapping functions into a viable class and this grant me an random file acces with no speed overall.
The file is opened once and kept open as long as i want to use it.
However, Win32 API allow me to open it with some shared properties, allowing other process to access it.
It's like reading the ifle into memory but all is done by the kernel and with minimal memory impact
Actually, I open the file and get a memory image of it then I read data from this image in the same way I could read a simple unsigned char[].
For some formats it works very well, I could read and write concurently the file at will ( like DC6 and other non compressed, non CRCed file).
For tbl of course the mere presence of the CRC, offset table and so on prevent me to do this. So I only realloc string space when the user edit one of them.
At save time, the only thing I have to do is to recalculate hash and CRC but the big string chunk of the file is already allocated ...
I've packed some low level Win32 File mapping functions into a viable class and this grant me an random file acces with no speed overall.
The file is opened once and kept open as long as i want to use it.
However, Win32 API allow me to open it with some shared properties, allowing other process to access it.
It's like reading the ifle into memory but all is done by the kernel and with minimal memory impact
Actually, I open the file and get a memory image of it then I read data from this image in the same way I could read a simple unsigned char[].
For some formats it works very well, I could read and write concurently the file at will ( like DC6 and other non compressed, non CRCed file).
For tbl of course the mere presence of the CRC, offset table and so on prevent me to do this. So I only realloc string space when the user edit one of them.
At save time, the only thing I have to do is to recalculate hash and CRC but the big string chunk of the file is already allocated ...
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum
Shadow Empire (coming soon) | forum
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Re: string table format?
As far as I know the CRC is never used or checked. I think Ondo in his program (that initial perl stuff) might have skiped it. Or at least he mentioned he tested with that and it worked great. One need to change the hashes and string entries of course if one edit strings.
-
- Dominion
- Posts: 6921
- Joined: Mon May 27, 2002 7:19 am
- Location: Orsay
Re: string table format?
I don't think so, when i first try to write tbl, my CRC value was wrong and the game crash on start up.
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum
Shadow Empire (coming soon) | forum
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Re: string table format?
Strange, Ondo said he had used a blank CRC with no problem. Will check. Just for the sake of it.Joel" wrote:I don't think so, when i first try to write tbl, my CRC value was wrong and the game crash on start up.
-
- Paladin
- Posts: 150
- Joined: Thu Jul 25, 2002 4:09 am
- Location: Indianapolis, IN, USA
Re: string table format?
Er, I meant Hash function when I said CRC.Jarulf" wrote:As far as I know the CRC is never used or checked. I think Ondo in his program (that initial perl stuff) might have skiped it. Or at least he mentioned he tested with that and it worked great. One need to change the hashes and string entries of course if one edit strings.
---Evil Peer
-
- Posts: 97
- Joined: Sun Jun 30, 2002 12:11 am
- Location: Colorado, USA
Re: string table format?
Hmmm that code kinda confuses me, as I'm not very good at C/C++ yet ..Is there by chance a document of the specs?
-MAKF1127
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Re: string table format?
[quote=MAKF1127";p="73403"]Hmmm that code kinda confuses me, as I'm not very good at C/C++ yet ..Is there by chance a document of the specs?[/quote]
Look at the start of it, that basically tells the layout of the file
Lets see, I will do this from memory so I hope I don't tell wrong.
First comes a header:
// Header info location
CRCOffset = 0x00, // word
First there is a word sized CRC value. it is not really used or checked so you can actually ignore it.
NumElementsOffset = 0x02, // word
This tells the number of entries in the "element" section
HashTableSizeOffset = 0x04, // dword
Same for the hashtable, number of entries
VersionOffset = 0x08, // byte
This is the version of the file. Since there has only ever been one version, it should be 1 in all files (or is it 0? don't remember)
StringStartOffset = 0x09, // dword
This is a value telling at what offset in the file the strings are stored at.
NumLoopsOffset = 0x0D, // dword
This is a value that is part of the hash algorithm. Baiscally how many "misses" you can maximum have for it
FileSizeOffset = 0x11, // dword
This tells the size of the file.
Following this header comes the element table.
It has the number of entries that was listed above in the header. Each entry is just a word sized value indicating which entry in the hastable correspond to it. So if we want to fecth entry 45, the game will look at the 46th entry of the element table. There it finds a number, for example 23 (I am just making up values here), thus string 45 will be handled as entry number 23 in the hash table.This makes it possible to look up strings faster than hashing and the searching (so to speak), or if you don't want to save a found string, a fast method to look it up again later. The game uses this method for some strings.
Next in the file comes the hastable, were each entry is called a "node". Each node has the following layout:
// Node info location
ActiveOffset = 0x00, // byte
This is just a flag telling if the entry is actually "active" and exists at all. Most entries shoudl be, but occationally you find one that is not.
IdxNbrOffset = 0x01, // word
This tells at what entry in the element table this file correspond. You should thus at that entry find a value that correspond to the order this node is in the has table.
HashValueOffset = 0x03, // dword
This holds the hash value for the string (search string) that this entry has, see below about hash values.
IdxStringOffset = 0x07, // dword
This tells the offset of the search string (it will be somewere inside the string section of the file. Note that all ofsets is relative to the whole file.
NameStringOffset = 0x0B, // dword
This holds the offset of the actual string that correspond to the search string.
NameLenOffset = 0x0F, // word
This holds the length of that string.
So, how do one normally look up a string? Well, you provide a lookup string. Now we have to hash that lookup string. That basically will return a value that correspond to the look up string. In an idea situation (for a fully optimized hash table, each lookup string should have its unique hash value. However, that is not the case here, several different lookup strings can give the same hash value.
Anyway, we now take the hash value and it correspond to the entry in the hash table. Lets assume we got the hash value of 55. We then look at node number 55 in the hash table. We can now if we want verify that this entry indeed has the hash value of 55 (see above for the part of the node that holds the hash value of it).
Now, as several look up strings can give this hash value, we need to compare the actual look up string with that of this node (the offset of which is give by the node, right? If it match, we have found what we are looking for and can get the string, which offset is found as the next to last entry of the node.
If the look up string for this entry was different, we need to seacrh for the next one. We do that by checking the next node, node number 56 in this case (when we reach the last node, we restart looking at the first one by the way). Again, comparing the lok up string of this entry. We continue to do that until we find a node that has a look up string matchin ours OR until we have such the maximu need nodes (as given by the Loop entry in the header) since by then, we can be sure that the string table file does not contain the lookup string we gave.
Hmm, did this make any sense? The code I have provides several ways to fetch the string, one way returns the pointer to the string, another retrun the string PLUS the entry value of it for the lement table. Another function makes it possible to instead such by the index number for the element table (could be some other methods too, not sure). I also have some functions for calculating the CRC value, that is not needed at all to read string table files. At the end I noticed some test functions I used to see if I had it all correct is still there. Also, I have functions in the code to actually read the files into memory in allocated memory (and to release that memory as well later).
If you still don't get it or have questions, please feel free to ask agian
Look at the start of it, that basically tells the layout of the file
Lets see, I will do this from memory so I hope I don't tell wrong.
First comes a header:
// Header info location
CRCOffset = 0x00, // word
First there is a word sized CRC value. it is not really used or checked so you can actually ignore it.
NumElementsOffset = 0x02, // word
This tells the number of entries in the "element" section
HashTableSizeOffset = 0x04, // dword
Same for the hashtable, number of entries
VersionOffset = 0x08, // byte
This is the version of the file. Since there has only ever been one version, it should be 1 in all files (or is it 0? don't remember)
StringStartOffset = 0x09, // dword
This is a value telling at what offset in the file the strings are stored at.
NumLoopsOffset = 0x0D, // dword
This is a value that is part of the hash algorithm. Baiscally how many "misses" you can maximum have for it
FileSizeOffset = 0x11, // dword
This tells the size of the file.
Following this header comes the element table.
It has the number of entries that was listed above in the header. Each entry is just a word sized value indicating which entry in the hastable correspond to it. So if we want to fecth entry 45, the game will look at the 46th entry of the element table. There it finds a number, for example 23 (I am just making up values here), thus string 45 will be handled as entry number 23 in the hash table.This makes it possible to look up strings faster than hashing and the searching (so to speak), or if you don't want to save a found string, a fast method to look it up again later. The game uses this method for some strings.
Next in the file comes the hastable, were each entry is called a "node". Each node has the following layout:
// Node info location
ActiveOffset = 0x00, // byte
This is just a flag telling if the entry is actually "active" and exists at all. Most entries shoudl be, but occationally you find one that is not.
IdxNbrOffset = 0x01, // word
This tells at what entry in the element table this file correspond. You should thus at that entry find a value that correspond to the order this node is in the has table.
HashValueOffset = 0x03, // dword
This holds the hash value for the string (search string) that this entry has, see below about hash values.
IdxStringOffset = 0x07, // dword
This tells the offset of the search string (it will be somewere inside the string section of the file. Note that all ofsets is relative to the whole file.
NameStringOffset = 0x0B, // dword
This holds the offset of the actual string that correspond to the search string.
NameLenOffset = 0x0F, // word
This holds the length of that string.
So, how do one normally look up a string? Well, you provide a lookup string. Now we have to hash that lookup string. That basically will return a value that correspond to the look up string. In an idea situation (for a fully optimized hash table, each lookup string should have its unique hash value. However, that is not the case here, several different lookup strings can give the same hash value.
Anyway, we now take the hash value and it correspond to the entry in the hash table. Lets assume we got the hash value of 55. We then look at node number 55 in the hash table. We can now if we want verify that this entry indeed has the hash value of 55 (see above for the part of the node that holds the hash value of it).
Now, as several look up strings can give this hash value, we need to compare the actual look up string with that of this node (the offset of which is give by the node, right? If it match, we have found what we are looking for and can get the string, which offset is found as the next to last entry of the node.
If the look up string for this entry was different, we need to seacrh for the next one. We do that by checking the next node, node number 56 in this case (when we reach the last node, we restart looking at the first one by the way). Again, comparing the lok up string of this entry. We continue to do that until we find a node that has a look up string matchin ours OR until we have such the maximu need nodes (as given by the Loop entry in the header) since by then, we can be sure that the string table file does not contain the lookup string we gave.
Hmm, did this make any sense? The code I have provides several ways to fetch the string, one way returns the pointer to the string, another retrun the string PLUS the entry value of it for the lement table. Another function makes it possible to instead such by the index number for the element table (could be some other methods too, not sure). I also have some functions for calculating the CRC value, that is not needed at all to read string table files. At the end I noticed some test functions I used to see if I had it all correct is still there. Also, I have functions in the code to actually read the files into memory in allocated memory (and to release that memory as well later).
If you still don't get it or have questions, please feel free to ask agian
-
- Posts: 97
- Joined: Sun Jun 30, 2002 12:11 am
- Location: Colorado, USA
Re: string table format?
Alright, I turned this into a table to make it more clear. Is THIS correct? Also, I may have a few questions on the offset descriptions as well . Thaks alot
-MAKF1127
-MAKF1127
Last edited by MAKF1127 on Thu Mar 06, 2003 5:17 pm, edited 1 time in total.
-MAKF1127
-
- Champion of the Light
- Posts: 346
- Joined: Sun May 26, 2002 9:20 am
Re: string table format?
[quote=MAKF1127";p="74961"]Alright, I turned this into a table to make it more clear. Is THIS correct? Also, I may have a few questions on the offset descriptions as well . Thaks alot
-MAKF1127[/quote]
yes, it looks correct. Note that I have been a bit inconsistant with what I call "offset". I typically mean that in "IdxStringOffset", it means the offset within the node that tells where the index string is pointed too. That value in itself is an offset of course. But I think you got it since it looks OK.
Also, I checked and indeed, the version value is always 0 in all files so far.
What are the questions you had?
-MAKF1127[/quote]
yes, it looks correct. Note that I have been a bit inconsistant with what I call "offset". I typically mean that in "IdxStringOffset", it means the offset within the node that tells where the index string is pointed too. That value in itself is an offset of course. But I think you got it since it looks OK.
Also, I checked and indeed, the version value is always 0 in all files so far.
What are the questions you had?
-
- Posts: 97
- Joined: Sun Jun 30, 2002 12:11 am
- Location: Colorado, USA
Re: string table format?
What I did is create a small string table with darkstorms table editor, and i examined it according to the table I made from what you said.
Okay, here are a few:
Questions for the actual offset descriptions:
1.) Just what exactly is the "NumLoopsOffset"?
2.) What makes a diff if an entry is active or not? Is it like this:? If I have a entry called hax in patchstring.tbl, and one in expantionstring.tbl (it checks patchstring.tbl first IIRC), and the one in patchstring.tbl is not active but the one in expantion string.tbl is active it will take the first active one?
3.) IdxNbrOffset <== As I under stand it, this is just what index it is in the hash table... But when I looked at my test file, the second node's value here was zero...
4.) what exactly is HashValueOffset ?
5.) what exacly is NameStringOffset?
6.) NameLenOffset <== My first entry had a key of "Test String1" and a value of "OneTwoThree"... Then I looked at this offset for the first node, and I got something like 36... Shouldn't it be 11? (OneTwoThree = 11 chars)
General File Structure Questions:
1.) What is the purpose of an element table? If I wanted the 45th node on teh hash table, can't I just go right to it with out a element table?
2.) When searching for things like item names, etc... Do you just do a sequentail search on all the keys?
Thanks a ton
MAKF1127
Okay, here are a few:
Questions for the actual offset descriptions:
1.) Just what exactly is the "NumLoopsOffset"?
2.) What makes a diff if an entry is active or not? Is it like this:? If I have a entry called hax in patchstring.tbl, and one in expantionstring.tbl (it checks patchstring.tbl first IIRC), and the one in patchstring.tbl is not active but the one in expantion string.tbl is active it will take the first active one?
3.) IdxNbrOffset <== As I under stand it, this is just what index it is in the hash table... But when I looked at my test file, the second node's value here was zero...
4.) what exactly is HashValueOffset ?
5.) what exacly is NameStringOffset?
6.) NameLenOffset <== My first entry had a key of "Test String1" and a value of "OneTwoThree"... Then I looked at this offset for the first node, and I got something like 36... Shouldn't it be 11? (OneTwoThree = 11 chars)
General File Structure Questions:
1.) What is the purpose of an element table? If I wanted the 45th node on teh hash table, can't I just go right to it with out a element table?
2.) When searching for things like item names, etc... Do you just do a sequentail search on all the keys?
Thanks a ton
MAKF1127
Last edited by MAKF1127 on Sat Mar 08, 2003 7:04 am, edited 1 time in total.
-MAKF1127