string table format?

This would be the forum for questions about how to work with mod making tools which can be a problem of its own.

Moderator: Paul Siramy

User avatar
MAKF1127
Posts: 97
Joined: Sun Jun 30, 2002 12:11 am
Location: Colorado, USA

string table format?

Post by MAKF1127 » Mon Dec 30, 2002 6:04 pm

Does someone know where I can find the file format of the .tbl files (Strings)? Or perhaps the source of a program which reads them?
Thanks
MAKF1127

User avatar
Joel
Moderator
Dominion
Posts: 6921
Joined: Mon May 27, 2002 7:19 am
Location: Orsay

Hand-picked

Post by Joel » Mon Dec 30, 2002 6:55 pm

just examine the perl script that is available on the keep.
I also have a C++ class that handle it quite well.
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum

User avatar
MAKF1127
Posts: 97
Joined: Sun Jun 30, 2002 12:11 am
Location: Colorado, USA

Post by MAKF1127 » Mon Dec 30, 2002 8:22 pm

where can I take a look at that C++ source? (I know C++ better than perl)
Thanks
MAKF1127

User avatar
Myhrginoc
Retired Admin
Cherub
Posts: 12100
Joined: Sat May 25, 2002 7:28 am
Location: Percussion U
United States of America

Hand-picked

Post by Myhrginoc » Mon Dec 30, 2002 9:00 pm

Actually, you get a description of the format from the comments of the perl script, but the actual hash function is in perl.
Ondo and Mephansteras" wrote:# A intro to the string.tbl format.
#
# There are four main sections to the string.tbl file.
# First, the header. This is 21 bytes long.
# Second, an array with two bytes per entry, that gives an index into the next table. This allows lookups of strings by number.
# Third, a hash array, with 17 bytes per entry, which has the pointers to the key and value strings, and has the strings sorted basically by hash value. This allows lookups of strings by key.
# Fourth, the actual strings themselves.
Additionally, there are two word values at offset +02 and +04 that contain the number of string records in the file. I don't know why two values, and the second value is never less than the first as far as I have seen. Perhaps the first value is the number of strings and the second is the maximum number of strings the file has ever had.
Do the right thing. It will gratify some people and astonish the rest.
~ Mark Twain
Run Diablo II in any version for mods: tutorial
The Terms of Service!! Know them, abide by them, and enjoy the forums at peace.
The Beginner's Guide v1.4: (MS Word | PDF) || Mod Running Scripts || TFW: Awakening

User avatar
MAKF1127
Posts: 97
Joined: Sun Jun 30, 2002 12:11 am
Location: Colorado, USA

Post by MAKF1127 » Mon Dec 30, 2002 10:59 pm

Where can I find the perl script or C++ source?
MAKF1127

User avatar
TheWizard
Junior Member
Paladin
Posts: 160
Joined: Mon Oct 21, 2002 1:13 pm
Location: Kansas

Post by TheWizard » Tue Dec 31, 2002 12:43 am

Here is the link to the Keep's Tutorial on String Tables:
http://dynamic2.gamespy.com/~phrozenkee ... uettar.php

Hope this helps![/url]

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Re: string table format?

Post by Jarulf » Wed Jan 08, 2003 2:46 pm

Here is a file I once created (in C although I might have used some C++ extensions) to read stringtable files in a larager program I wrote. I had some customized file open/close functions and also a memory allocation function so that is why I have double lines of code for that one of which is marked as comment (edit: this version I found here seem to be changed to include the memory allocation and such so I can give away the file :) ).

I never got arround to implement everything I wanted so a few things are half done only. I also have minimal comments but it should hopefully be OK. The good thing is that it do include functions to actualy pars the string tables in all the ways the game can and the file format should be relatively easy to figure out from the variables at the start.

Enjoy!


Edit: This seems to be a slightly old version of the file I have. Not sure if it had any errors though. I will check for the most resent version once I get to the computer that has those files.

Code: Select all

// stringtable.cpp
//
// Created by Pedro Faria (Jarulf).
//
// Many thanks to Peter Hatch (Ondo) for information
// about the structure and algorithms regarding
// the file "string.tbl".

#include <stdio.h>	// remove if in stdafx.h
#include <stdlib.h>	// remove if in stdafx.h
#include <string.h>	// remove if in stdafx.h

#include "stdafx.h"

enum {
	// File info. Some are not used by program

	// Size info of various sections
	HeaderSize					= 0x15,
	ElementSize					= 0x02,
	NodeSize					= 0x11,

	// Header info location
	CRCOffset					= 0x00,		// word
	NumElementsOffset			= 0x02,		// word
	HashTableSizeOffset			= 0x04,		// dword
	VersionOffset				= 0x08,		// byte (always 0)
	StringStartOffset			= 0x09,		// dword
	NumLoopsOffset				= 0x0D,		// dword
	FileSizeOffset				= 0x11,		// dword

	// Element info location
	NodeNumOffset				= 0x00,		// word

	// Node info location
	ActiveOffset				= 0x00,		// byte
	IdxNbrOffset				= 0x01,		// word
	HashValueOffset				= 0x03,		// dword
	IdxStringOffset				= 0x07,		// dword
	NameStringOffset			= 0x0B,		// dword
	NameLenOffset				= 0x0F,		// word

	// KeyNums
	StringKeyNum				=     0,
	PatchStringKeyNum			= 10000,
	ExpansionStringKeyNum		= 20000
};

static bool		IsInit = false;
static char		*ptStringTable = NULL;
static char		*ptExpansionStringTable = NULL;
static char		*ptPatchStringTable = NULL;
static char		strStringFilename[] = "string.tbl";
static char		strExpansionStringFilename[] = "expansionstring.tbl";
static char		strPatchStringFilename[] = "patchstring.tbl";
static char		strNameNotFound[] = "Unknown name";
static char		strNull[] = "";



////////////////////
// Memory allocation
////////////////////

// just here to make compilation possible
// would be in other file normally
  
static void allocateMemory(void *ptMemx, int sizeMem)
{
	void **ptMem = (void **)ptMemx;

	if ((*ptMem = malloc(sizeMem)) == NULL)
	{
		printf("Error: Can't allocate %d bytes of memory, program terminated.\n", sizeMem);
		exit(0);
	}
} // allocateMemory


void deallocateMemory(void  *ptMemx)
{
	void **ptMem = (void **)ptMemx;

	if (*ptMem != NULL)
	{
		free(*ptMem);
		*ptMem=NULL;
	}
} // deallocateMemory



////////////////////
// Utility functions
////////////////////


static unsigned short getNumElements(char *ptTable)
{
	return *(unsigned short *) (ptTable + NumElementsOffset);
} // getNumElements


static int getHashTableSize(char *ptTable)
{
	return *(int *) (ptTable + HashTableSizeOffset);
} // getHashTableSize


static int getNumLoops(char *ptTable)
{
	return *(int *) (ptTable + NumLoopsOffset);
} // getNumLoops


static char *getptStringStart(char *ptTable)
{
	return ptTable + HeaderSize + ElementSize*getNumElements(ptTable) + NodeSize*getHashTableSize(ptTable);
} // getptStringStart


static char *getptStringEnd(char *ptTable)
{
	return ptTable + (*(unsigned int *)(ptTable + FileSizeOffset));
} // getptStringEnd


static char *getptFirstNode(char *ptTable)
{
	return ptTable + HeaderSize + ElementSize*getNumElements(ptTable);
} // getptFirstNode

static int getNodeNum(char *ptElement)
{
	return *(unsigned short *) (ptElement + NodeNumOffset);
} // getNodeNum


static int getIdxNum(char *ptNode)
{
	return (*(int *)(ptNode + IdxNbrOffset));
} // getIdxNum


static char *getptIdxString(char *ptTable, char *ptNode)
{
	return ptTable + (*(int *)(ptNode + IdxStringOffset));
} // getptIdxString


static char *getptNameString(char *ptTable, char *ptNode)
{
	return ptTable + (*(int *)(ptNode + NameStringOffset));
} // getptNameString


static unsigned int getFileSize(char *ptHeader)
{
	return *(unsigned int *)(ptHeader + FileSizeOffset);
} // getFileSize


////////////////
// CRC functions
////////////////


static int calcCRC(unsigned char *ptStart, unsigned char *ptEnd)
{
	unsigned char	*ptCur;	
	unsigned short	CRCValue;
	unsigned short	CRCTableEntry;

	static const unsigned short	CRCTable[256] = {
		0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50A5, 0x60C6, 0x70E7, 0x8108, 0x9129, 0xA14A, 0xB16B, 0xC18C, 0xD1AD, 0xE1CE, 0xF1EF,
		0x1231, 0x0210, 0x3273, 0x2252, 0x52B5, 0x4294, 0x72F7, 0x62D6, 0x9339, 0x8318, 0xB37B, 0xA35A, 0xD3BD, 0xC39C, 0xF3FF, 0xE3DE,
		0x2462, 0x3443, 0x0420, 0x1401, 0x64E6, 0x74C7, 0x44A4, 0x5485, 0xA56A, 0xB54B, 0x8528, 0x9509, 0xE5EE, 0xF5CF, 0xC5AC, 0xD58D,
		0x3653, 0x2672, 0x1611, 0x0630, 0x76D7, 0x66F6, 0x5695, 0x46B4, 0xB75B, 0xA77A, 0x9719, 0x8738, 0xF7DF, 0xE7FE, 0xD79D, 0xC7BC,
		0x48C4, 0x58E5, 0x6886, 0x78A7, 0x0840, 0x1861, 0x2802, 0x3823, 0xC9CC, 0xD9ED, 0xE98E, 0xF9AF, 0x8948, 0x9969, 0xA90A, 0xB92B,
		0x5AF5, 0x4AD4, 0x7AB7, 0x6A96, 0x1A71, 0x0A50, 0x3A33, 0x2A12, 0xDBFD, 0xCBDC, 0xFBBF, 0xEB9E, 0x9B79, 0x8B58, 0xBB3B, 0xAB1A,
		0x6CA6, 0x7C87, 0x4CE4, 0x5CC5, 0x2C22, 0x3C03, 0x0C60, 0x1C41, 0xEDAE, 0xFD8F, 0xCDEC, 0xDDCD, 0xAD2A, 0xBD0B, 0x8D68, 0x9D49,
		0x7E97, 0x6EB6, 0x5ED5, 0x4EF4, 0x3E13, 0x2E32, 0x1E51, 0x0E70, 0xFF9F, 0xEFBE, 0xDFDD, 0xCFFC, 0xBF1B, 0xAF3A, 0x9F59, 0x8F78,
		0x9188, 0x81A9, 0xB1CA, 0xA1EB, 0xD10C, 0xC12D, 0xF14E, 0xE16F, 0x1080, 0x00A1, 0x30C2, 0x20E3, 0x5004, 0x4025, 0x7046, 0x6067,
		0x83B9, 0x9398, 0xA3FB, 0xB3DA, 0xC33D, 0xD31C, 0xE37F, 0xF35E, 0x02B1, 0x1290, 0x22F3, 0x32D2, 0x4235, 0x5214, 0x6277, 0x7256,
		0xB5EA, 0xA5CB, 0x95A8, 0x8589, 0xF56E, 0xE54F, 0xD52C, 0xC50D, 0x34E2, 0x24C3, 0x14A0, 0x0481, 0x7466, 0x6447, 0x5424, 0x4405,
		0xA7DB, 0xB7FA, 0x8799, 0x97B8, 0xE75F, 0xF77E, 0xC71D, 0xD73C, 0x26D3, 0x36F2, 0x0691, 0x16B0, 0x6657, 0x7676, 0x4615, 0x5634,
		0xD94C, 0xC96D, 0xF90E, 0xE92F, 0x99C8, 0x89E9, 0xB98A, 0xA9AB, 0x5844, 0x4865, 0x7806, 0x6827, 0x18C0, 0x08E1, 0x3882, 0x28A3,
		0xCB7D, 0xDB5C, 0xEB3F, 0xFB1E, 0x8BF9, 0x9BD8, 0xABBB, 0xBB9A, 0x4A75, 0x5A54, 0x6A37, 0x7A16, 0x0AF1, 0x1AD0, 0x2AB3, 0x3A92,
		0xFD2E, 0xED0F, 0xDD6C, 0xCD4D, 0xBDAA, 0xAD8B, 0x9DE8, 0x8DC9, 0x7C26, 0x6C07, 0x5C64, 0x4C45, 0x3CA2, 0x2C83, 0x1CE0, 0x0CC1,
		0xEF1F, 0xFF3E, 0xCF5D, 0xDF7C, 0xAF9B, 0xBFBA, 0x8FD9, 0x9FF8, 0x6E17, 0x7E36, 0x4E55, 0x5E74, 0x2E93, 0x3EB2, 0x0ED1, 0x1EF0};
	
	ptCur = ptStart;
	CRCValue = 0xFFFF;
	while(ptCur < ptEnd)
	{	
		CRCTableEntry = CRCValue / 0x0100;
		CRCTableEntry ^= (unsigned short)(*ptCur);
		CRCValue &= 0x000000FF;
		CRCValue *= 0x00000100;
		CRCValue ^= CRCTable[CRCTableEntry];
		ptCur++;
	}
	return CRCValue;
} // calcCRC


static int getCRC(char *ptTable)
{
	char			*ptStart;
	char			*ptEnd;
	
	if(ptTable==NULL)
		return -1;
	ptStart = getptStringStart(ptTable);
	ptEnd = getptStringEnd(ptTable);

	return calcCRC((unsigned char *)ptStart,(unsigned char *)ptEnd);
} // getCRC


static bool setCRC(char *ptTable)
{
	int	CRCValue;

	if(ptTable==NULL)
		return false;
	if((CRCValue=getCRC(ptTable))!=-1)
	{
		(*(unsigned short *)(ptTable + CRCOffset)) = (unsigned short)CRCValue;
		return true;
	}
	else
		return false;
} // setCRC


/////////////////
// Hash functions
/////////////////


static int getHash(char *ptKeyString, int HashTableSize)
{
	char			charValue;
	unsigned int	hashValue;
	char			*ptKeyStringChar;
	
	hashValue = 0;
	ptKeyStringChar = ptKeyString;
	while ((charValue = *ptKeyStringChar++) != '\0')
	{
		hashValue *= 0x10;
		hashValue += charValue;
		if ((hashValue & 0xF0000000) != 0)
		{
			unsigned int tempValue = hashValue & 0xF0000000;
			tempValue /= 0x01000000;
			hashValue &= 0x0FFFFFFF;
			hashValue ^= tempValue;
		}
	}
	return hashValue % HashTableSize;
} // getHash


///////////////////////////////////
// Internal string search functions
///////////////////////////////////


static int getString(char *ptTable, char *ptKeyString, char **ptString)
{
	int				HashTableSize;
	int				NumLoops;
	char			*ptFirstNode;
	char			*ptNode;
	int				HashValue;
	int				Loop;
	char			*ptIdxString;
	
	HashTableSize = getHashTableSize(ptTable);
	NumLoops = getNumLoops(ptTable);
	ptFirstNode = getptFirstNode(ptTable);
	HashValue = getHash(ptKeyString, HashTableSize);

	Loop = 0;
	while (Loop++ < NumLoops)
	{
		ptNode = ptFirstNode + NodeSize*HashValue;
		if (*ptNode + ActiveOffset == 1)
		{
			ptIdxString = getptIdxString(ptTable,ptNode);
			if (strcmp(ptIdxString, ptKeyString) == 0)
			{
				*ptString = getptNameString(ptTable,ptNode);
				return getIdxNum(ptNode);
			}
		}
		HashValue++;
		HashValue %= HashTableSize;
	}
	return -1;
} // getString


static char *getStringNum(char *ptTable, int KeyNum)
{
	char			*ptFirstNode;
	char			*ptNode;
	char			*ptElement;
	int				NodeNum;
	
	ptFirstNode = getptFirstNode(ptTable);
	ptElement = ptTable + HeaderSize + ElementSize*KeyNum;
	NodeNum = getNodeNum(ptElement);
	ptNode = ptFirstNode + NodeSize*NodeNum;
	if (*ptNode + ActiveOffset == 1)
	{
		return getptNameString(ptTable,ptNode);
	}
	return NULL;
} // getStringNum


///////////////////////////////////
// Exported string search functions
///////////////////////////////////


int getNumStringByName(char *ptKeyString, char **ptString)
{
	int		IdxNbr;
	
	if ((!IsInit) || (ptKeyString == NULL))
	{
		*ptString = NULL;
		return -1;
	}
	if (ptPatchStringTable != NULL)
	{
		if ((IdxNbr = getString(ptPatchStringTable, ptKeyString, ptString)) != -1)
			// KeyString found in patchstring.tbl
			return IdxNbr + PatchStringKeyNum;
	}
	if (ptExpansionStringTable != NULL)
	{
		if ((IdxNbr = getString(ptExpansionStringTable, ptKeyString, ptString)) != -1)
			// KeyString found in expansionstring.tbl
			return IdxNbr + ExpansionStringKeyNum;
	}
	if (ptStringTable != NULL)
	{
		if ((IdxNbr = getString(ptStringTable, ptKeyString, ptString)) != -1)
			// KeyString found in string.tbl
			return IdxNbr + StringKeyNum;
	}

	// KeyString was not found
	*ptString = strNameNotFound;
	return -1;
} // getNumStringByName


char *getStringByName(char *ptKeyString)
{
	char *ptString = NULL;
 
	getNumStringByName(ptKeyString, &ptString);
	return ptString;
} // getStringByName


char *getStringByNum(int KeyNum)
{
	char	*ptString = NULL;
	
	if (!IsInit)
		return NULL;
	if (KeyNum >= ExpansionStringKeyNum)
	{
		if(ptExpansionStringTable != NULL)
		{
			if ((ptString = getStringNum(ptExpansionStringTable, KeyNum - ExpansionStringKeyNum)) != NULL)
				// KeyNum found in expansionstring.tbl
				return ptString;
		}
	}
	else if (KeyNum >= PatchStringKeyNum)
	{
		if (ptPatchStringTable != NULL)
		{
			if ((ptString = getStringNum(ptPatchStringTable, KeyNum - PatchStringKeyNum)) != NULL)
				// KeyNum found in patchstring.tbl
				return ptString;
		}
	}
	else 
	{
		if (ptStringTable != NULL)
		{
			if ((ptString = getStringNum(ptStringTable, KeyNum - StringKeyNum)) != NULL)
				// KeyNum found in string.tbl
				return ptString;
		}
	}

	// KeyNum was not found
	return strNameNotFound;
} // getStringByNum


/////////////////
// initialization
/////////////////


static bool initTable(char **ptTable, char ptFileName[])
{
	FILE			*Source;
	char			Header[HeaderSize];
	unsigned int	FileSize;

	if ((Source=fopen(ptFileName, "rb")) == NULL)
//	if ((Source=fileopen(ptFileName, "rb")) == NULL)
		return false;
	bool IsOK = false;
	if (fread(Header, sizeof(char), sizeof(Header), Source) == sizeof(Header))
	{
		FileSize = getFileSize(Header);
		allocateMemory(ptTable, FileSize);
		rewind(Source);
		if (fread(*ptTable, sizeof(char), FileSize, Source) == FileSize)
			IsOK = true;
		else
		{
			free(*ptTable);
			*ptTable = NULL;
		}
	}
	fclose(Source);
//	fileclose(Source);
	return IsOK;
} // initTable


static writeResult(FILE *WriteDestination, char *strText)
{
	if (WriteDestination != NULL)
		fprintf(WriteDestination,"Found: %s\n", strText);
} // writeResult


bool initStringTables(FILE *WriteDestination)
{
	if (!IsInit)
	{
		if (initTable(&ptStringTable, strStringFilename))
			writeResult(WriteDestination, strStringFilename);
		if (initTable(&ptExpansionStringTable, strExpansionStringFilename))
			writeResult(WriteDestination, strExpansionStringFilename);
		if (initTable(&ptPatchStringTable, strPatchStringFilename))
			writeResult(WriteDestination, strPatchStringFilename);
		IsInit = true;
	}
	return IsInit;
} // initStringTables


bool closeTable(char *ptTable)
{
	if(ptTable!=NULL)
		deallocateMemory(&ptTable);
	return true;
} // closeTable

bool closeStringTables(void)
{
	if (IsInit)
	{
		closeTable(ptStringTable);
		closeTable(ptExpansionStringTable);
		closeTable(ptPatchStringTable);
		IsInit = false;
	}
	return true;
} // closeStringTables


////////////////
// testing stuff
////////////////


void writefile(char filename[], char *ptTable)
{
 	FILE	*target;

	if ((target=fileopen(filename,"w"))==NULL)
		return;

	unsigned short	NumElements;
	int				HashTableSize;
	int				NumLoops;
	char			*ptFirstNode;
	char			*ptNode;
	char			*ptElement;
	int				NodeNum;
	int				idx;
	
	NumElements = getNumElements(ptTable);
	HashTableSize = getHashTableSize(ptTable);
	NumLoops = getNumLoops(ptTable);
	ptFirstNode = getptFirstNode(ptTable);
	
	fprintf(target,"%s\n",filename);
	fprintf(target,"Elements: %d, Hashs: %d, Loops: %d\n",NumElements,HashTableSize,NumLoops);
	fprintf(target," Num    EIdx Act HEIdx   Hash    Len   String\n");
	for(idx=0;idx<HashTableSize;idx++)
	{
		ptElement = ptTable + HeaderSize + ElementSize*idx;
		NodeNum = getNodeNum(ptElement);
		ptNode = ptFirstNode + NodeSize*idx;

		fprintf(target,"%5d",idx);
		if(idx<NumElements)
			fprintf(target,"  %5d",NodeNum);
		else
			fprintf(target,"       ");
		fprintf(target,"  %1d  %5d  %5d  %5d",*ptNode + ActiveOffset,(*(unsigned short *)(ptNode + IdxNbrOffset)),(*(int *)(ptNode + HashValueOffset)),(*(unsigned short *)(ptNode + NameLenOffset)));
		fprintf(target,"   %-25s", getptIdxString(ptTable,ptNode));
		fprintf(target,"   %-80s", getptNameString(ptTable,ptNode));
		fprintf(target,"\n");
	}
	fileclose(target);
} // writefile


void teststringtable(void)
{
	if(initStringTables(stdout))
	{
		getCRC(ptStringTable);
		getCRC(ptPatchStringTable);
		getCRC(ptExpansionStringTable);
		writefile("infostring.txt", ptStringTable);
		writefile("infopstring.txt", ptPatchStringTable);
		writefile("infoestring.txt", ptExpansionStringTable);
	}
} // teststringtable
In case anyone wonder, the header file looks like this:

Code: Select all

// stringtable.h

extern int getNumStringByName(char *ptKeyString, char *ptString);
extern char *getStringByName(char *ptKeyString);
extern char *getStringByNum(int KeyNum);
extern bool initStringTables(FILE *WriteDestination);
extern bool closeStringTables(void);

extern void test(void);
Last edited by Jarulf on Mon Feb 16, 2004 12:55 am, edited 8 times in total.

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Post by Jarulf » Wed Jan 08, 2003 2:51 pm

Myhrginoc" wrote:
Additionally, there are two word values at offset +02 and +04 that contain the number of string records in the file. I don't know why two values, and the second value is never less than the first as far as I have seen. Perhaps the first value is the number of strings and the second is the maximum number of strings the file has ever had.

Actually the value at +02 holds the number of "elements" in the file, that is in the part of the file were you look up a string by number. The value at +04 holds the hash table size, that is the number of entries in it. Those two does not nessecarilly have to be the same. The reason the second is at times larger is that the has table in some string table files holds empty entries while the element table never seem to do that.

Also note that the game code never uses the version number for anything. Probably since Blizzard has never updated it and hence all string tables are version "0".

User avatar
Joel
Moderator
Dominion
Posts: 6921
Joined: Mon May 27, 2002 7:19 am
Location: Orsay

Hand-picked

Re: string table format?

Post by Joel » Wed Jan 08, 2003 6:11 pm

Just a question.
My tbl reading code is more or less like yours Jarulf but i get the worst performance i ever add for loading it ( takes somes seconds for loading string.tbl)
What's reading time did you get with this one ?
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Re: string table format?

Post by Jarulf » Wed Jan 08, 2003 7:26 pm

Joel" wrote:Just a question.
My tbl reading code is more or less like yours Jarulf but i get the worst performance i ever add for loading it ( takes somes seconds for loading string.tbl)
What's reading time did you get with this one ?
Ehh you mean reading the file into memory? I would say instantly. I use the standard C function to read a file(fread) and read it in two steps. First the header to find the file size and then the reast of the file. never clocked it or anything. I have the .tbl files extracted and in some folder on the hard disk.

User avatar
Joel
Moderator
Dominion
Posts: 6921
Joined: Mon May 27, 2002 7:19 am
Location: Orsay

Hand-picked

Re: string table format?

Post by Joel » Wed Jan 08, 2003 7:29 pm

ok ... i've just try your code and it seems that the only thing that change between my code & yours is that i need to recopy the string into string objects for further editing, so there is a lot of new/delete in nested loops ...
quite bad :(

will rework it
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Re: string table format?

Post by Jarulf » Thu Jan 09, 2003 12:04 am

Joel" wrote:ok ... i've just try your code and it seems that the only thing that change between my code & yours is that i need to recopy the string into string objects for further editing, so there is a lot of new/delete in nested loops ...
quite bad :(

will rework it
Ahh yeah,, that would severly hit speed performance. Of course, it might not matter much if it is just a one time initialization but if you are like me, you would hate it and recode it to be faster :)

User avatar
Joel
Moderator
Dominion
Posts: 6921
Joined: Mon May 27, 2002 7:19 am
Location: Orsay

Hand-picked

Re: string table format?

Post by Joel » Thu Jan 09, 2003 10:16 pm

Paul just gave me an idea :

After opening the tbl, I DON'T read it as a whole.
I use file-mapping to get the key and display them into a listbox.
Then i only allocate string memory for string that are currently edited.

And it's unlikely possible for a single user to edit ALL strings at once ...

Just by using file-mapping i've get to 13.12 sec for loading a tbl to 0.15s ...
No comment ;)
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Re: string table format?

Post by Jarulf » Fri Jan 10, 2003 1:26 am

Joel" wrote:Paul just gave me an idea :

After opening the tbl, I DON'T read it as a whole.
I use file-mapping to get the key and display them into a listbox.
Then i only allocate string memory for string that are currently edited.

And it's unlikely possible for a single user to edit ALL strings at once ...

Just by using file-mapping i've get to 13.12 sec for loading a tbl to 0.15s ...
No comment ;)
Why not actually read the file all at once into some kind of buffer. Then move the string into your string memory when editing. That seems better (using some memory moves as needed) instead of having the file opened all the time (or opening/closing it) accessing it each time you mess with a string.

User avatar
Evil Peer
Junior Member
Paladin
Posts: 150
Joined: Thu Jul 25, 2002 4:09 am
Location: Indianapolis, IN, USA

Re: string table format?

Post by Evil Peer » Fri Jan 10, 2003 8:21 am

Joel" wrote:After opening the tbl, I DON'T read it as a whole.
I use file-mapping to get the key and display them into a listbox.
Then i only allocate string memory for string that are currently edited.

And it's unlikely possible for a single user to edit ALL strings at once ...
Er, you're not talking about just reading strings from the file as you need them are you? Because you can't edit the strings in place (with offsets and lengths and CRCs and all). You are just prolonging the inevitable, since you will have to completely recreate the file when it is saved.

What someone else suggested of reading the whole file into memory first would actually work though. I changed a ROM editing program I wrote to do just that and saw a monumental speed increase (order of magnitude).

---Evil Peer

User avatar
Joel
Moderator
Dominion
Posts: 6921
Joined: Mon May 27, 2002 7:19 am
Location: Orsay

Hand-picked

Re: string table format?

Post by Joel » Fri Jan 10, 2003 9:38 am

Actually, if I want it to do it that way I can ...
I've packed some low level Win32 File mapping functions into a viable class and this grant me an random file acces with no speed overall.
The file is opened once and kept open as long as i want to use it.
However, Win32 API allow me to open it with some shared properties, allowing other process to access it.

It's like reading the ifle into memory but all is done by the kernel and with minimal memory impact :)

Actually, I open the file and get a memory image of it then I read data from this image in the same way I could read a simple unsigned char[].
For some formats it works very well, I could read and write concurently the file at will ( like DC6 and other non compressed, non CRCed file).
For tbl of course the mere presence of the CRC, offset table and so on prevent me to do this. So I only realloc string space when the user edit one of them.

At save time, the only thing I have to do is to recalculate hash and CRC but the big string chunk of the file is already allocated ...
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Re: string table format?

Post by Jarulf » Fri Jan 10, 2003 10:25 am

As far as I know the CRC is never used or checked. I think Ondo in his program (that initial perl stuff) might have skiped it. Or at least he mentioned he tested with that and it worked great. One need to change the hashes and string entries of course if one edit strings.

User avatar
Joel
Moderator
Dominion
Posts: 6921
Joined: Mon May 27, 2002 7:19 am
Location: Orsay

Hand-picked

Re: string table format?

Post by Joel » Fri Jan 10, 2003 10:54 am

I don't think so, when i first try to write tbl, my CRC value was wrong and the game crash on start up.
"How much suffering, mortal, does it take before you lose your grace?"
Shadow Empire (coming soon) | forum

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Re: string table format?

Post by Jarulf » Fri Jan 10, 2003 3:18 pm

Joel" wrote:I don't think so, when i first try to write tbl, my CRC value was wrong and the game crash on start up.
Strange, Ondo said he had used a blank CRC with no problem. Will check. Just for the sake of it.

User avatar
Evil Peer
Junior Member
Paladin
Posts: 150
Joined: Thu Jul 25, 2002 4:09 am
Location: Indianapolis, IN, USA

Re: string table format?

Post by Evil Peer » Tue Jan 14, 2003 9:36 pm

Jarulf" wrote:As far as I know the CRC is never used or checked. I think Ondo in his program (that initial perl stuff) might have skiped it. Or at least he mentioned he tested with that and it worked great. One need to change the hashes and string entries of course if one edit strings.
Er, I meant Hash function when I said CRC.

---Evil Peer

User avatar
MAKF1127
Posts: 97
Joined: Sun Jun 30, 2002 12:11 am
Location: Colorado, USA

Re: string table format?

Post by MAKF1127 » Sat Mar 01, 2003 6:15 pm

Hmmm that code kinda confuses me, as I'm not very good at C/C++ yet :oops: ..Is there by chance a document of the specs?
-MAKF1127

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Re: string table format?

Post by Jarulf » Sat Mar 01, 2003 6:42 pm

[quote=MAKF1127";p="73403"]Hmmm that code kinda confuses me, as I'm not very good at C/C++ yet :oops: ..Is there by chance a document of the specs?[/quote]

Look at the start of it, that basically tells the layout of the file :)

Lets see, I will do this from memory so I hope I don't tell wrong.

First comes a header:

// Header info location
CRCOffset = 0x00, // word

First there is a word sized CRC value. it is not really used or checked so you can actually ignore it.

NumElementsOffset = 0x02, // word

This tells the number of entries in the "element" section


HashTableSizeOffset = 0x04, // dword

Same for the hashtable, number of entries

VersionOffset = 0x08, // byte

This is the version of the file. Since there has only ever been one version, it should be 1 in all files (or is it 0? don't remember)


StringStartOffset = 0x09, // dword

This is a value telling at what offset in the file the strings are stored at.

NumLoopsOffset = 0x0D, // dword

This is a value that is part of the hash algorithm. Baiscally how many "misses" you can maximum have for it

FileSizeOffset = 0x11, // dword


This tells the size of the file.

Following this header comes the element table.

It has the number of entries that was listed above in the header. Each entry is just a word sized value indicating which entry in the hastable correspond to it. So if we want to fecth entry 45, the game will look at the 46th entry of the element table. There it finds a number, for example 23 (I am just making up values here), thus string 45 will be handled as entry number 23 in the hash table.This makes it possible to look up strings faster than hashing and the searching (so to speak), or if you don't want to save a found string, a fast method to look it up again later. The game uses this method for some strings.

Next in the file comes the hastable, were each entry is called a "node". Each node has the following layout:

// Node info location
ActiveOffset = 0x00, // byte

This is just a flag telling if the entry is actually "active" and exists at all. Most entries shoudl be, but occationally you find one that is not.

IdxNbrOffset = 0x01, // word

This tells at what entry in the element table this file correspond. You should thus at that entry find a value that correspond to the order this node is in the has table.

HashValueOffset = 0x03, // dword

This holds the hash value for the string (search string) that this entry has, see below about hash values.


IdxStringOffset = 0x07, // dword

This tells the offset of the search string (it will be somewere inside the string section of the file. Note that all ofsets is relative to the whole file.

NameStringOffset = 0x0B, // dword

This holds the offset of the actual string that correspond to the search string.


NameLenOffset = 0x0F, // word

This holds the length of that string.

So, how do one normally look up a string? Well, you provide a lookup string. Now we have to hash that lookup string. That basically will return a value that correspond to the look up string. In an idea situation (for a fully optimized hash table, each lookup string should have its unique hash value. However, that is not the case here, several different lookup strings can give the same hash value.

Anyway, we now take the hash value and it correspond to the entry in the hash table. Lets assume we got the hash value of 55. We then look at node number 55 in the hash table. We can now if we want verify that this entry indeed has the hash value of 55 (see above for the part of the node that holds the hash value of it).

Now, as several look up strings can give this hash value, we need to compare the actual look up string with that of this node (the offset of which is give by the node, right? If it match, we have found what we are looking for and can get the string, which offset is found as the next to last entry of the node.

If the look up string for this entry was different, we need to seacrh for the next one. We do that by checking the next node, node number 56 in this case (when we reach the last node, we restart looking at the first one by the way). Again, comparing the lok up string of this entry. We continue to do that until we find a node that has a look up string matchin ours OR until we have such the maximu need nodes (as given by the Loop entry in the header) since by then, we can be sure that the string table file does not contain the lookup string we gave.


Hmm, did this make any sense? The code I have provides several ways to fetch the string, one way returns the pointer to the string, another retrun the string PLUS the entry value of it for the lement table. Another function makes it possible to instead such by the index number for the element table (could be some other methods too, not sure). I also have some functions for calculating the CRC value, that is not needed at all to read string table files. At the end I noticed some test functions I used to see if I had it all correct is still there. Also, I have functions in the code to actually read the files into memory in allocated memory (and to release that memory as well later).

If you still don't get it or have questions, please feel free to ask agian :)

User avatar
MAKF1127
Posts: 97
Joined: Sun Jun 30, 2002 12:11 am
Location: Colorado, USA

Re: string table format?

Post by MAKF1127 » Thu Mar 06, 2003 5:17 pm

Alright, I turned this into a table to make it more clear. Is THIS correct? Also, I may have a few questions on the offset descriptions as well :). Thaks alot
-MAKF1127
Last edited by MAKF1127 on Thu Mar 06, 2003 5:17 pm, edited 1 time in total.
-MAKF1127

Jarulf
Junior Member
Champion of the Light
Posts: 346
Joined: Sun May 26, 2002 9:20 am

Hand-picked

Re: string table format?

Post by Jarulf » Fri Mar 07, 2003 1:40 pm

[quote=MAKF1127";p="74961"]Alright, I turned this into a table to make it more clear. Is THIS correct? Also, I may have a few questions on the offset descriptions as well :). Thaks alot
-MAKF1127[/quote]


yes, it looks correct. Note that I have been a bit inconsistant with what I call "offset". I typically mean that in "IdxStringOffset", it means the offset within the node that tells where the index string is pointed too. That value in itself is an offset of course. But I think you got it since it looks OK.

Also, I checked and indeed, the version value is always 0 in all files so far.

What are the questions you had?

User avatar
MAKF1127
Posts: 97
Joined: Sun Jun 30, 2002 12:11 am
Location: Colorado, USA

Re: string table format?

Post by MAKF1127 » Sat Mar 08, 2003 7:00 am

What I did is create a small string table with darkstorms table editor, and i examined it according to the table I made from what you said.

Okay, here are a few:

Questions for the actual offset descriptions:

1.) Just what exactly is the "NumLoopsOffset"?

2.) What makes a diff if an entry is active or not? Is it like this:? If I have a entry called hax in patchstring.tbl, and one in expantionstring.tbl (it checks patchstring.tbl first IIRC), and the one in patchstring.tbl is not active but the one in expantion string.tbl is active it will take the first active one?

3.) IdxNbrOffset <== As I under stand it, this is just what index it is in the hash table... But when I looked at my test file, the second node's value here was zero...

4.) what exactly is HashValueOffset ?

5.) what exacly is NameStringOffset?

6.) NameLenOffset <== My first entry had a key of "Test String1" and a value of "OneTwoThree"... Then I looked at this offset for the first node, and I got something like 36... Shouldn't it be 11? (OneTwoThree = 11 chars)
General File Structure Questions:

1.) What is the purpose of an element table? If I wanted the 45th node on teh hash table, can't I just go right to it with out a element table?

2.) When searching for things like item names, etc... Do you just do a sequentail search on all the keys?

Thanks a ton :)
MAKF1127
Last edited by MAKF1127 on Sat Mar 08, 2003 7:04 am, edited 1 time in total.
-MAKF1127

Return to “Tools”