Anagrams / Word lists

Blitz3D Forums/Blitz3D Beginners Area/Anagrams / Word lists

_PJ_(Posted 2010) [#1]
I'm writing a simple word / anagram game in the style of "Countdown" for those who know the Channel 4 program.

Essentially 9 letters are randomly selected and the player(s) have 30 seconds to make the longest anagram they can from those letters. The words must be between 3 and 9 letters l;ong, with no slang or proper nouns.

Everything's going well, and the computer AI guesses from a list.

My concern, is that of course, slower CPUs will make fewer guesses than a faster CPU.

To enable AI guessing, the program searches lists of words in files.
The dictionary lists are separated by Initials, i.e. 26 lists one each for a, b etc. and arranged alphabetically.
This makes searches faster in some respects, but also means that there's no real preference over the length of words.

(There's no way to identify the length of a guessed word, util, of course, that word is READ from the file.)

If I arranged the words by length, then the checking of the validity of the word (i.e. are ALL the letters available) would take longer per 'guess', since EVERY letter in the word would need to be checked, but of course, I'm now checking the longer words first...

With large enough dictionary lists, it's possible that slower CPU wont even check the entire dictionaries. Therefore, some guesses will NEVER be encountered.

I am considering some method of tracking the time it takes to make guesses, i.e. if it takes 5 seconds for the computer to scan through the first letter's dictionary, then it's taking too long, since ideally for 9 letters, to scan for all guesses in 30 seconds, would need around 3 seconds each.

Of course, there's some difference in the length of the dictionary lists, but this is neglible.

I store all the guesses attempted in a type, along with their lengths so that I can adjust the output later to reflect a "SKILL LEVEL" this seems more appropriate considering the differences that may presented just due to hardware.

I'm not sure how to deal with the effect of hardware slowdown.
Frame-limiting could restrict 'superfast' computers, but the issue is more to do with slower processing.

One consideration may be to suggest a "minimum system requirement", though this still wouldn't affect the possibility of , say, running a virus checker in the background, which would cause the AI to complete fewer guesses in the allotted time.

Does anyone have any suggestions/ideas for this?

Would it possibly be better to hold the dictionaries in RAM?
The average filesize of a dictionary I am using is around 30k

30k * 26 letters ~ 800MB
This seems a bit heavy :S


Warner(Posted 2010) [#2]
Have you concidered cheating? Instead of trying to make an 'honest' AI, you could choose one of the words from the dictionary, and scramble that to start off the game. Then the AI allready knows what word can be made of these letters. You would then need to find a way to emulate the computer not finding the word somehow.
I think it would also contribute to the 'solvability' (not sure if that is a word) of the game. If you randomly choose 9 letters, and they are all consonant, it could be hard to create a word from it.


_PJ_(Posted 2010) [#3]
For a typical anagram, that would be the best solution, but in this case, it doesn't apply too well, I think.

That wont help in this case, Warner, since there would always be a 'x' letter word, which the computer would know.
The chaces of their being a longer word in the overall selection is much more unlikel;y, and if a 9 letter word is chosen to be re-arranged, then it would render the AI pretty much unbeatable. Also, less chance to even allow for a 'second-best' guess offering.

I store the valid 'guesses' that are chosen, so that after the time-critical period is over, the program can decide on which guess to offer (i.e. longer or shorter words) depending on 'Skill Level'.

Also, since the distribution of letters is from specific amounts of each, assembled into consonants and vowels, then the computer AI can be forced to select between x and y vowels and u and v consonants for example.
A human player is free to choose all consonants or all vowels should they wish... they still need to make a word as well :D
I appreciate, this isn't something anyone'd realyl be aware of unless they know the TV show.


_PJ_(Posted 2010) [#4]
TO add..

the issue I have is not HOW to solve the anagrams, but in the time it takes to acquire guesses varying due to CPU speed.


Warner(Posted 2010) [#5]
Well, I would think that if you want the same result on slow PC's as you'd get on a faster PC, you would need to adjust the program to the slow PC, and tune the faster PC's down.
So, I would put my effords into finding an optimal searching routine that would run as you want it on slower PC's.
Maybe it could help searching to sort each words letters alphabetically?
anagram = pirea->aeipr
dictionary word = ape->aep
dictionary word = pea->aep
Also, I would look into binary tree searching, and offcourse do a Google search on "solve anagram algorithm".


_PJ_(Posted 2010) [#6]
So, I would put my effords into finding an optimal searching routine that would run as you want it on slower PC's.
Maybe it could help searching to sort each words letters alphabetically?


That's the plan.


So far, currently I have my words arranged in 26 "dictionary files", one for each letter of the alphabet. This should make it quicker at least to find words beginning with say, "z" because it can start right at the 'z' dictionary, rather than go through a-y first.

The real issue, though, iws the fact that more than one word may be formed.
To try and give a simplified example, imagine the nine letters were:

REASONING

(convenient, huh? ;) )

Anyway, the search begins, first looking for words that begin with 'R' so the first dictionary file it checks is the 'R' dictionary.
The alphabetically listed words inside 'R' range from, say, RABBLE to RYES or something, for the sake of argument, let's assume the first VALID (i.e. can be made from the letters available) word that is found, is "RAGE". This word is stored in a Type, and then the very next word, is "RAGES"
This is in fact, a "better" guess, since it is valid AND is longer.

However, making valid guesses takes longer, as an invalid word, i.e. "RABBLE" would quit checking that word as soon as the checking function reached the first "B" (the third character).

Therefore, it's possible, that words such as "REASONING"l, while blatantly obvious, may not be ound, if there are too many other valid words in the dictionary beforehand - which push the 'time taken' on 'guessing per letter' over a pre-set limit.
Without such a limitt, more words could be found, but there's that it would not find as many potential valid words, or that the time taken to scour the lists would far exceed the 30 seconds overall.

Also, it's not so much, slower PCs working as fast PC's would, but also that fast PC's may well be 'Too Good'.

---

You've made a really good point about working to the format of the slower PCs, making that a kinda benchmark, and slowing down anything that's faster. This would seem the best avenue to take, just not sure how to necessarily know HOW to go about this...

I think what I'll do for now, is quit worrying about it as yet, continue with how it works for now, and onec I have something 'playable' I can ask some folks here to est it and ghopefully get some speed feedback from trhe blitzers' PC specs ranges :)


_PJ_(Posted 2010) [#7]
I'm wondering now, whether it may even be somewhat faster to preload the dictionaries into memory - of course, this would speed things up in the time-critical phase, but would make for a 'loading time' pause at the initialisation, and more importantly, if a machine is 'too slow' to iterate through dictionary lists in time, perhaps too, it may not have the memory space and caching abilities to do this...

I'm still plugging away to reach a really useable test evrsion, but since I need to write out my own dictionary lists, it's taking some time (I'm, on "D" now :D)

There ARE available lists on the internet, but these all seem to be:
A. American 'English'
B. Contain proper nouns
C. Other invalidating reasons.