urlencode a string

BlitzMax Forums/BlitzMax Programming/urlencode a string

Henri(Posted 2012) [#1]
Hello,
sense I have never posted anything other the questions I thought I'd post this incase someone finds this usefull. Basic idea is to convert characters that might cause problems to hex values when used in web url's or perhaps you want to store text in database table with SQL query and the string can't be passed as value otherwise.


Strict

Local test:String = "test''~q~q~~"

Print "Original string = "+test

Local a:String=urlencode(test)

Print "urlencoded = "+a

Local b:String=urldecode(a)

Print "urldecoded = "+b

Function urlencode:String(str:String)
	If str="" Then Return str
	
	' Define unwanted characters
	'---------------------------
	Local bad:String = "~~<>%}\];?@&#{|^[`/:=$+~q'~n~r~t~0"
	'---------------------------
	Local result:String
	Local char:String
	For Local i:Int = 1 To Len(str)
		char=Mid(str,i,1)
		If Instr(bad,char,1)
			result:+ "%"+Right(Hex(Asc(char)-1),2)
		Else
			result:+char
		EndIf
	Next
	Return result
EndFunction

Function urldecode:String(str:String)
	If str="" Then Return str
	Local result:String
	Local char:String
	For Local i:Int = 1 To Len(str)
		char=Mid(str,i,1)
		If Instr(char,"%",1)
			result:+Chr(Int("$"+Mid(str,i+1,2))+1)
			i:+2
		Else
			result:+char
		EndIf
	Next
	Return result	
EndFunction




-Henri


ziggy(Posted 2012) [#2]
Thanks for sharing. However this is not taking into account any accent or any non-english character. I would considere basing it ona list of "allowed" characters, instead of on a list of non allowed ones, as the allowed characters are much less than the unallowed ones on a Unicode string (think on chinese or Japanese for that matter).


Henri(Posted 2012) [#3]
Thanks for the reply,
and you are quite right. This function only operates in values between 1-256 (the standard Blitzmax range I presume) and not beyond that like you might encounter with non-western languages. Originally I was using this to convert a info text to be saved to a database field with newline and other special characters intact so that when I displayed it would look the same. I knew what to expect so I didn't bother to take this into account. I think an UTF-friendly version wouldn't be that difficult to make ?

-Henri


Htbaa(Posted 2012) [#4]
There's also an URLEncode and URLDecode in the Code Archives: http://www.blitzmax.com/codearcs/codearcs.php?code=1581. I used and altered it for use in htbaapub.rest: https://github.com/Htbaa/rest.mod/blob/master/url.bmx

So far those functions have worked great for creating decent URL encoded strings.

In your case, for databases, might I suggest looking into provided methods for escaping data? I believe bah.database already does this for you and I would be surprised if your database of choice doesn't have built in support for prepared statements in which you can bind values. Let the database API take care of the escaping.


ziggy(Posted 2012) [#5]
This function only operates in values between 1-256 (the standard Blitzmax range I presume)
BlitzMax is full Unicode compilant, so chars could go from 0 to 65536. Also, in the 0 - 255 there are letters like: áéíóúàèìòùäëïöüâêîôûÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÝýñÑçÇ among others...
supporting unicode characters is a bit complicated. It usually means splitting the characters and start making sort of byte encoding.

Last edited 2012


Henri(Posted 2012) [#6]
Hello all,

@htbaa:

I had a hunch someone had tried this before and if there are already tried and tested methods then that is a good thing. My purpose was to write something simple and easy just to demonstrate a basic idea. As for the database thing, I didn't know that binding values for prepared statements would some how preformat the string ? The string I'm passing as a value for the UPDATE-statement creates an error sayin that token is not regognized or some like that. I thought that all string manipulation in bah.database module was to convert ISO to UTF8 and back.

@ziggy:

This hasn't always been the case if I remember correctly ? So if were to use asc("<insert some weird character here>") it would return correct unicode number ?


-Henri


Yasha(Posted 2012) [#7]
BlitzMax is full Unicode compilant, so chars could go from 0 to 65536.


If chars only go from 0 to 65535, then it definitively isn't Unicode compliant at all: Unicode has something like a million code points and there is no fixed upper limit on the number. Two-byte wide chars is the obsolete UCS-2 format and is not considered a modern Unicode-complaint representation.

For practical purposes, it is enough (covers all the European languages and the commonly used CJK characters), but it isn't real Unicode, and it definitely doesn't count as proper internationalisation support.

Last edited 2012


ziggy(Posted 2012) [#8]
@Yasha: You're right, anyway I *think* BlitzMax uses 32 bits characters string representation if I'm not wrong.


ProfJake(Posted 2012) [#9]
Nope. You can see it in the code, namely the "blitz_string.h" file.
BlitzMax does indeed only use the outdated UCS-2 for strings, so no freaky Unicode characters : )

+1 on Yasha's post