|
| 09 Jan 2013 07:51 AM |
I'm working on some magic and I've been wondering how I can accomplish this.
A variable length quantity is basically a way of storing a number using the fewest number of bytes possible. In each byte, a portion of the number is stored in the lower 7 bits. The 8th bit signifies if there is another byte following, or if the current byte is the final one. The number is in big endian format.
Eg. 0x7B 01111011 The 8th bit is not set, so the number is 1111011 or 123.
0x8768 10000111 01101000 In the first byte, the 8th bit is set so we know to include the second byte. The number is 0000111 1101000 (remove the 8th bits and join the groups of 7 bits together) or 1000.
0xFFFFFF7F 11111111 11111111 11111111 01111111 Each of the bytes except the last has the 8th bit set, so we know to include 4 bytes. The resulting number is 1111111 1111111 1111111 1111111 or 0x0FFFFFFF
Hopefully I've explained that well enough.
The problem arises when the algorithm produces null chars (\0), which roblox's StringValues (and possibly other string storage) read as the string terminator and then ignore everything after that point.
Can anyone think of a different way to represent the data so as to not use any null chars? The best solution I've come up with so far is to only use 6 bits of the final byte, leaving one of the bits constantly set. |
|
|
| Report Abuse |
|
|
Anaminus
|
  |
 |
| Joined: 29 Nov 2006 |
| Total Posts: 5945 |
|
|
| 09 Jan 2013 11:02 AM |
I put some thought into this while writing a network system. It seems easiest to just escape null characters with a string such as "\0", and then "\\" for backslashes. You could even extend it to do some basic compression, to make up for using two characters for every null or backslash. For example, you could have 4 escape sequences: \0: null character \1x: null characters \2: backslash \3x: backslashes
\0 and \2 escape a single occurrence of a character. \1x and \3x can escape runs of 2-256 characters. The "x" byte represents the amount of escaped characters, *minus one*. This is done to squeeze in some extra compression, since there's already an escape sequence for 1 character.
I find this easiest because it's just an extra layer that can easily be undone. The data within doesn't conflict with null-termination, so you can continue formatting it how ever you want. Plus, you're all ready when null-termination is fixed for string values; all you have to do is stop escaping. |
|
|
| Report Abuse |
|
|
|
| 09 Jan 2013 06:10 PM |
You could have the byte at the start 'X' tell you how many bytes are in the proceeding byte count value (1-256), then the value with X digits after that tells you how many bytes are in the variable length thingie. That gives you a variable length of up to 3.231700607131100730071487668867e+616, but I'm sure you can manage with this limitation.
X#####asdfkjnasdjkfnaksjdfniuaghiabreguaergehfbawufjhbasjhdfbajhefbaubvgaasfdghbajhsvfuaweg |
|
|
| Report Abuse |
|
|
|
| 09 Jan 2013 06:11 PM |
| Wait, wrong problem. Lua can't write \10 to a file by the way. |
|
|
| Report Abuse |
|
|
Waffle3z
|
  |
| Joined: 15 Apr 2011 |
| Total Posts: 266 |
|
|
| 09 Jan 2013 06:23 PM |
"3.231700607131100730071487668867e+616"
there is not a single '5' in this number. |
|
|
| Report Abuse |
|
|
HotThoth
|
  |
 |
| Joined: 24 Aug 2010 |
| Total Posts: 1176 |
|
|
| 09 Jan 2013 06:33 PM |
^ y u hate 9?
- HotThoth
~ I Thoth so ~ |
|
|
| Report Abuse |
|
|
|
| 09 Jan 2013 06:39 PM |
@Waffle No, there are 71 fives.
32317006071311007300714876688669951960444102669715484032130345427524655138867890893197201411522913463688717960921898019494119559150490921095088152386448283120630877367300996091750197750389652106796057638384067568276792218642619756161838094338476170470581645852036305042887575891541065808607552399123930385521914333389668342420684974786564569494856176035326322058077805659331026192708460314150258592864177116725943603718461857357598351152301645904403697613233287231227125684710820209725157101726931323469678542580656697935045997268352998638215525166389437335543602135433229604645318478604952148193555853611059596230656 |
|
|
| Report Abuse |
|
|
LPGhatguy
|
  |
 |
| Joined: 27 Jun 2008 |
| Total Posts: 4725 |
|
|
| 09 Jan 2013 06:44 PM |
@xXxMoNkEyMaNxXx But you hid them all with that ridiculous contraction to scientific notation! You evil statistician. |
|
|
| Report Abuse |
|
|
|
| 09 Jan 2013 09:32 PM |
The problem with escaping nulls is that they occur frequently but not in runs. The only place they can occur is in a variable length quantity which is guaranteed to have non-nulls directly before and after. What I could do is use an uncommon character to represent 0 and then escape that when it comes up for real.
Is fixing stringValue null-termination actually planned and coming? |
|
|
| Report Abuse |
|
|
|
| 12 Jan 2013 06:27 AM |
I think I'll go with only using 6 bits for the final char.
Representing 0 with another character means I have to escape less often, but that creates 3 cases (escaping nulls, the substitution (when it appears for real), and the escape character) which isn't very pretty.
Using only 6 bits shouldn't make too much of a difference. The values are usually either really small or relatively big, so they might stay within the range of 1 character and the loss of a bit wouldn't be enough to raise the large ones up to 3 bytes (plus they're uncommon enough to not matter much).
Unless anyone has a better solution. |
|
|
| Report Abuse |
|
|
|
| 12 Jan 2013 01:09 PM |
Here's one that turns a number into a number into a binary string but with the format as you stated.
function toBin(num) local s = "" local cur = 0 while num > 0 do if num % ((2 ^ cur) * 2) >= 2 ^ cur then s = "1"..s num = num - 2 ^ cur else s = "0"..s end if (cur + 1) % 7 == 0 then s = (cur == 6 and "0" or "1")..s end cur = cur + 1 end local add = #s % 8 return add == 0 and s or (cur > 7 and "1" or "0")..("0"):rep(8 - add - 1)..s end
|
|
|
| Report Abuse |
|
|