Anagrams - Combinations/Perumatations

Blitz3D Forums/Blitz3D Beginners Area/Anagrams - Combinations/Perumatations

Matt Vinyl(Posted 2007) [#1]
Hi all,

I'm putting together a word game (using B3D) but am having a bit of a headache with one particular part.

Basically, an 8-letter word is picked at random from a dictionary, and each of the 8 letters are then placed in an array.

How can I then 'check' (against the dictionary) all words that can be made from these letters? (i'm limiting it to 3-8 letter words, so things like that 'at' and 'so' are not necessary.

One thing I also need to take into account, is that if a word contains more than one of the same letters, I don't want it to count the words twice. For example, if the 8 letter word was 'ARRANGED' I wouldn't want to include the word 'ARE' twice.

Let me know if you need any further info!

:)

Cheers,
Matt...


Beaker(Posted 2007) [#2]
Couldn't you just use an existing anagram server and parse the resulting page?:
http://www.ssynth.co.uk/~gay/cgi-bin/nph-an?line=fawlty+towers&words=3&dict=antworth
http://wordsmith.org/anagram/anagram.cgi?anagram=fawlty+towers&language=english&d=3&include=&exclude=&n=3&m=&source=adv&a=n&l=n&q=n


Matt Vinyl(Posted 2007) [#3]
Hmm, quite possible. Trouble is, there's about 80000+ 8 letter words that are included in my dictionary, which means 'pre-parsing' any data manually wouldn't really be an option. It really needs to be done 'dynamically'. (My dictionary is just a .txt file that has been imported into an array...

Cheers though!
M.


Sledge(Posted 2007) [#4]
I was having a little think about this and quickly realised that it's way more complex than I'd anticipated, mostly because (as an erstwhile hobbyist coder) I'm used to iterating rather than permutating. (And why wouldn't I be... my spell checker doesn't even think 'permutating' is a real word!)

But I digress, there's a neat article that explains what permutations logically are (and therefore how to build the blasted things) here. Made me go "Oooh, of course!".


Matt Vinyl(Posted 2007) [#5]
Oooh, good stuff! I'll take a look at that.

Agreed, it at first sounded simple, but soon became a lot more involved than I thought it would! :)


H&K(Posted 2007) [#6]
Well thats because permutating isnt a real word


Sledge(Posted 2007) [#7]
Pfft. 'Conscientization' isn't a real word either but it's still of vital importance to those who use it. Permutating permutating permutating permutating. [Does little permutating dance]


H&K(Posted 2007) [#8]
HA Fool ;)
Ive nothing against you using any madeup word you want. I was explaining to you the reason your spellchecker didnt think it was a real word, was because it wasnt.


jfk EO-11110(Posted 2007) [#9]
Permutation sounds like something very smart, but as far as I see this is pretty simple, isn't it?

Simply parse the dictionary. If a word consists of characters that are all contained in the array then you can add them to the permutation list.You may also check the list for double entries. Did I miss something?

eg:

word$="ARRANGED"

dict$=read word by word from the directory

noway=0
for i=1 to len(dict$)
mi$=mid$(dict$,i,1)
if instr(word$,mi$)=0
noway=1 ; uses chars we ain't got in our word
endif
next
if noway=0 then add it to the list.


Subirenihil(Posted 2007) [#10]
Loop through all permutations that are the same length as the original word.
For each permutation check if it is a word or if the first 3 letters are a word, first four letters, five letters, etc.
If it is a word, check and see if you already have it.

You could check if you already have that word first instead of if its a word.


Sledge(Posted 2007) [#11]

Simply parse the dictionary. If a word consists of characters that are all contained in the array then you can add them to the permutation list.You may also check the list for double entries. Did I miss something?



No, I think that's probably the most straightforward way of going about it. [EDIT: Removed rest - think I misread you. But anyway, yeah, I had gotten caught up on generating every possible permutation because, that way, if a 'proper' anagram exists for a word it will crop up somewhere along the permutation list... longwinded though as you'd still have to parse each permutation to discount the gobbledygook.]

Re coding the thing. I'd wrap each letter of the original string in a type with a 'used' field (set to false for each instance before each dictionary check begins) so you could easily remove letters from the equation once they had been matched and thus avoid problems associated with letter repetition.


*See Subirenihil's post.


Sledge(Posted 2007) [#12]
Okay, something like this:


exampleDict.txt is just a text file containing...
hello
this
is
an
example
dictionary
with
one
entry
per
line
so
blitz
can
read
it
easily


So, for example, if you enter 'blitzbasic' as your source word you will get three results from the example dictionary... is, blitz & it. Not an anagram engine in itself, but it should help you build one.

EDIT: Can't help thinking that, rather than trying to reinvent the wheel based on all the incomplete ideas and code snips that we post, you should probably just reference an existing solution.


jfk EO-11110(Posted 2007) [#13]
I don't see what's wrong with my method. you even don't need a "used field" when you simply check each character of a dictionary entry with INSTR to see if i's part of the orginal main word. Ok maybe I didn't get this right, you said you don't want an anagram to be listed twice, which could be filtered easily. If you want to prevent single letters from being used multiple times then you may use the "used" field, as described, or a count_usage field that has to be 1 for valid matches.

I also think it's wrong to generate a list of possible anagrams if you are going to remove those that are not part of the dictionary anyway.


Sledge(Posted 2007) [#14]

you even don't need a "used field" when you simply check each character of a dictionary entry with INSTR to see if i's part of the orginal main word.


Well the letters that constitute the sub-word may simply be in the wrong order. 'Let Val sue I' is an (awfully crap) anagram of 'televisual' but INSTR would never recognise the presence of 'let', 'val' or 'sue' unless the source set was sufficiently reordered.


I also think it's wrong to generate a list of possible anagrams if you are going to remove those that are not part of the dictionary anyway.



Continuing with the above example, exploring every possible permutation would simply ensure that the original set was sufficiently reordered for INST to be helpful. Yes, you'd get a lot of crap to disregard that way, but somewhere along the line you will also get 'Let Val sue I' along with every other valid anagram... it's a brute force way of getting valid results into a set in a fairly easy to parse format.

Incidentally, the parser I posted would be better suited to a more elegant search as it does not need letters to be in the correct order in order to recognise dictionary entries. You could simply pull out all the sub-words and go on to see whether or not there are any perfect combinations. (The gold medal would be an engine that tested for perfect combinations that made at least some dim sort of syntactical sense).


jfk EO-11110(Posted 2007) [#15]
quote:
ensure that the original set was sufficiently reordered for INST to be helpful.

I was talking about to use INSTR to determine if ONE letter is part of a word. Not entire words. When you test all letters of word 1 against the letters of word 2 then you will know how many times each one is contained in word 2. as long as this is one time, it's an anagram. It would be pretty insane to create an array of all potential permutations, only to compare them.

But maybe I misunderstood the whole thing.


Sledge(Posted 2007) [#16]

I was talking about to use INSTR to determine if ONE letter is part of a word. Not entire words. When you test all letters of word 1 against the letters of word 2 then you will know how many times each one is contained in word 2. as long as this is one time, it's an anagram. It would be pretty insane to create an array of all potential permutations, only to compare them.



I get you - for a one word anagram it's spot on. If you want more complex results (like a bunch of sub words that are comprised of the original word's letters) then I don't think you can get away without the kind of branching permutation tree covered in the article I linked. I'm holding off on a complete solution because it seems to me that (scary) recursive functions would be appropriate. :/