generic image
Processing...
  • Games
  • Catalog
  • Develop
  • Robux
  • Search in Players
  • Search in Games
  • Search in Catalog
  • Search in Groups
  • Search in Library
  • Log In
  • Sign Up
  • Games
  • Catalog
  • Develop
  • Robux
   
ROBLOX Forum » Game Creation and Development » Scripters
Home Search
 

Re: Data Compression(RLE)

Previous Thread :: Next Thread 
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 05:57 PM
[I understand that this is a very long text. Only those who wish to learn something should read.]
Roblox filter keep blocking my text, so I'll have to write it in several posts.

Howdy!
A few days ago, I've been interested in the world of data compression, more precisely, about the RLE algorithm.
I've had previously created a topic with my questions about it(http://www.roblox.com/Forum/AddPost.aspx?PostID=162481066&mode=flat).
The answers given there were accurate, but incomplete. Some ideas I learned there is the idea of an escape code(thane first mentioned it), compressing at the bit level(Cntkillme's idea), a good idea, but useless on Roblox so I will not talk about it here.

I did more research on my own(and experimented) and I was able to answer most of my previously asked question quite accurately(Although the ideas were
spread onto several different websites, and I had to figure some on my own). What I'll do here is post everything I've found, in case others are curious to know.
You could even go ahead and edit the Wikipedia page, since more tricks are explained here.

RLE is short for Run Length Encoding. This is an algorithm that compress repeated characters into a smaller form. "AAAAAAAAAABBBBB" would become "A10B5" for example. As
you can see, writing A10B5 is much shorter than writing A ten times and B five times. That's the principle RLE is based on. Now all that is nice and good, but this
form can cause ambiguity. Compressing "AAAAA12" would give "A512". If you were to decompress this, you would end up with the letter A repeated 512 times instead of
repeating it five times, followed by the number 12! Here's the tricks I've read about or figured out myself to remove this vagueness:
Report Abuse
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 05:58 PM
1)Use an escape code to indicate that a character is repeated multiple times. You could use "*" for example. AAAAAAAAAABBBBB would be written as A*10B*5. To solve the
"AAAAA12" problem, you would have to use the escape code twice, so it becomes A*5*12 . A much better idea than using an escape code here, is to repeat the character twice.
"AAAAAAAAAABBBBB" would turn into AA10BB5. Since the letter is repeated twice, the decompressor can understand that the character is repeated multiple times. "AAAAA12"
would become AA5A12 . Adding an extra A after 5 is necessary to make the distinction between 5 and 12.

2)Instead of writing the amount of times a character is repeated, write how many EXTRA times it is repeated. instead of changing "AAAAA" into "AA5", turn it into "AA3",
which is the same as saying that A is repeated 2+3=5 times. It's a small change, but it can help the compression rate. Figured out this one by myself.

3)Whenever exactly 3 symbols are repeated, "BBB10" for example, Do not convert it to "BB1B10", since this makes it lengthier.
Keep it as "BBB10". Your decoder can figure out that this part was not compressed, because it sees 3 similar symbol. Figured out this one by myself.

4)Get two sets of symbols for the counters. Instead of writing "119123" for "1111111111123", you could write "11J23" In this case J would represent 9(A=0 B=1 C=2..).
Figured this one out myself.
Report Abuse
eLunate is not online. eLunate
Joined: 29 Jul 2014
Total Posts: 13268
24 May 2015 06:06 PM
The most immediate thing I think of to make it is

local function RLE(s)
local fin = '';
local prevchar = '';
local prevloc = 1;
local count = 1;
for i=1,#s do
local c = s:sub(i,i);
if c == prevchar then
count = count+1;
else
fin = fin..prevchar..count;
prevchar = c;
count = 1;
end
end;
return fin:sub(2,-1);
end

Although, for decoding you can kind of safely assume that it goes char,number alternatingly...
Report Abuse
chimmihc is not online. chimmihc
Joined: 01 Sep 2014
Total Posts: 17143
24 May 2015 06:06 PM
That is a rather basic compression method.


I script -~ chimmihc
Report Abuse
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 06:07 PM
--[[Roblox is blocking #5 for no reasons, read next post to see what it said]]


6)When only two characters are repeated followed by a symbol used to count, such as "BBµ" you got a few possibilities. First, you could write it as "BBµBµ", adding
two extra characters Or you could write it as BBµµ.(remember that I said I'd use µ as 0). The Decoder can figure out that since the letter is repeated twice with
a symbol representing 0, that it should only remove the first µ. However, using the 2nd notation(BBµµ), it is impossible to send data to the compressor twice.
The first time it would become BBµµ the next time BBµµµ and so on. You can also use a special escape code just for this like "BB/µ" but you would have to implement
a way to add the / symbol into your data. IF ANYONE HAS A BETTER IDEA FOR THIS ONE SAY SO!

Here's some example using all these tips(Uncompressed=Compressed):
"AAAAA122444VVV¶ÆÆÆÆÆ92½½½½½½" = "AAÆ122444VVV¶ÆÆ3Æ92½½4" --total compression of 6
"SAND33µ999µ213599999®®®®®®BC" = "SAND33µ3µ999µ213599Æ9®®4BC" --total compression of 2

It's impossible to make a compression algorithm that can compress Every kind of data(It's mathematically proven), so every algorithms have flaws.
In this case the flaw is #6, where two characters only are repeated followed by a count symbol.
"AA¬¬AA¬¬AA¬¬AA¬¬AA" This is an example where the algorithm would not be able to compress the data.

Report Abuse
eLunate is not online. eLunate
Joined: 29 Jul 2014
Total Posts: 13268
24 May 2015 06:07 PM
But yes you still have that issue of the 512, because numbers can go higher than 9. Just stick \n in them or something, or just some char that won't really come up otherwise
Report Abuse
cntkillme is not online. cntkillme
Joined: 07 Apr 2008
Total Posts: 44956
24 May 2015 06:11 PM
You can easily do it on the bit level on Lua.
You don't store the number represented in ASCII from before ("10A") but you store the byte of how many there is ("\10A").

You can then easily get the next byte using string.byte and string.sub
Report Abuse
eLunate is not online. eLunate
Joined: 29 Jul 2014
Total Posts: 13268
24 May 2015 06:15 PM
Taking in mind what cnt said,

local function RLE(s)
local fin = '';
local prevchar = '';
local prevloc = 1;
local count = 1;
for i=1,#s do
local c = s:sub(i,i);
if c == prevchar then
count = count+1;
else
fin = fin..prevchar..string.char(count);
prevchar = c;
count = 1;
end
end;
return fin:sub(2,-1);
end

local function RLEdec(s)
local fin = {}
for i=1,#s,2 do
table.insert(fin,s:sub(i,i):rep(s:sub(i+1,i+1):byte()));
end
return table.concat(fin,'');
end


Or start learning to use huffman trees to try and represent your data because huffman trees.
Report Abuse
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 06:18 PM
5)Pick rare letters for your counters. Roblox keep blocking this line, so imagine a line of ascii characters that are not used often.
Report Abuse
cntkillme is not online. cntkillme
Joined: 07 Apr 2008
Total Posts: 44956
24 May 2015 06:19 PM
Just do it the correct way D:
Report Abuse
eLunate is not online. eLunate
Joined: 29 Jul 2014
Total Posts: 13268
24 May 2015 06:20 PM
local function RLE(s)
local fin = '';
local prevchar = '';
local prevloc = 1;
local count = 1;
for i=1,#s do
local c = s:sub(i,i);
if c == prevchar then
count = count+1;
else
fin = fin..prevchar..string.char(count);
prevchar = c;
count = 1;
end
end;
fin = fin..prevchar..string.char(count);
return fin:sub(2,-1);
end

local function RLEdec(s)
local fin = {}
for i=1,#s,2 do
table.insert(fin,s:sub(i,i):rep(s:sub(i+1,i+1):byte()));
end
return table.concat(fin,'');
end

I somehow managed to clip the end off of the converted string in my RLE
Report Abuse
cntkillme is not online. cntkillme
Joined: 07 Apr 2008
Total Posts: 44956
24 May 2015 06:21 PM
What's prevloc for O_o
Report Abuse
eLunate is not online. eLunate
Joined: 29 Jul 2014
Total Posts: 13268
24 May 2015 06:23 PM
You know, I actually have no idea. I'd intended on using it at some point, but I'm tired and the thing just formulated in my head without prevloc when I was writing it.
Report Abuse
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 06:23 PM
the 512 woudln't be a problem if you use all the tricks I named.
AAAAA12 becomes AA3A12

Instead of going for huffman coding I'll go with Arithmetic coding, It seems to achieve a better compression rate.
Report Abuse
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 06:48 PM
@cntkillme

How many characters will "/65" be saved as? 8 bits? 24 bits?
Report Abuse
cntkillme is not online. cntkillme
Joined: 07 Apr 2008
Total Posts: 44956
24 May 2015 06:49 PM
It's just 8 bits (1 byte)
Report Abuse
cntkillme is not online. cntkillme
Joined: 07 Apr 2008
Total Posts: 44956
24 May 2015 06:50 PM
'/65' is different then '\65' so never mind.
Yours would use 3 bytes, mine would be 1.
Report Abuse
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 06:57 PM
Then it still holds that compression at the bit level is impossible on roblox.("/" was a typo on my part)

Your idea is great, but I can't see how to use it on roblox(so I'll use it outside roblox ;p)

How do you save Less than 8 bits of information?
Since you said "\65" is 1 byte I can assume that "\10" is 1 byte too and so on..
"insert 5 bits here"

Report Abuse
cntkillme is not online. cntkillme
Joined: 07 Apr 2008
Total Posts: 44956
24 May 2015 07:00 PM
In strings on Lua, all characters are 1 byte large, you don't want to save only '5' bits of information because that means things like \3 you would think to save 2 bits, but then you get a varying size and have no idea how to parse that.

The point is to use a full byte before the byte that represents the character as a 'how many repetitions'.
Report Abuse
eLunate is not online. eLunate
Joined: 29 Jul 2014
Total Posts: 13268
24 May 2015 07:06 PM
If you need to encode on a bit level, you'll need to encode your entire thing into bits and then break it into bytes and use string.char. Just make sure your decoder understands where your last character is (up to 7 bits worth of ambiguity at the end)
Report Abuse
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 07:20 PM
Oh you ment like

AAAAABCCDEFFF

as
5A1B2C1D1E3F

If so, use the proposed solution that I posted, since using this method requires you to add an extra byte for all characters( ABCDE would give 1A1B1C1D1E)


Lunare is it possible to do that on roblox?(if so how, since im very interested!)
Report Abuse
cntkillme is not online. cntkillme
Joined: 07 Apr 2008
Total Posts: 44956
24 May 2015 07:22 PM
OFC it's possible. You can do what you mentioned before and have a 'skip' count, so if the number is negative you would not duplicate |num| bytes ahead. But that limites you from 0->255 to -128->127.
Then again you can use 2 bytes instead of 1 to store this information but it really depends what you are compressing.
Report Abuse
sonicdiablo is not online. sonicdiablo
Joined: 05 Feb 2009
Total Posts: 1661
24 May 2015 07:41 PM
ah, the packbit variant. I've read about this before but I didn't find it useful to add that to this post because it was starting to get long enough(And I'll make a 2nd post to explain the other techniques such as the one you mentionned and BWT transform)

Do you know an algorithm that can find the longest repeated string?

Example: "12KIL3ABC032ABCDEFGHIJKABCDEFGHIJK3491AB4298G"

In this case it would be 'ABCDEFGHIJK'
I'd need a method better than Brute-Force testing.
Report Abuse
Previous Thread :: Next Thread 
Page 1 of 1
 
 
ROBLOX Forum » Game Creation and Development » Scripters
   
 
   
  • About Us
  • Jobs
  • Blog
  • Parents
  • Help
  • Terms
  • Privacy

©2017 Roblox Corporation. Roblox, the Roblox logo, Robux, Bloxy, and Powering Imagination are among our registered and unregistered trademarks in the U.S. and other countries.



Progress
Starting Roblox...
Connecting to Players...
R R

Roblox is now loading. Get ready to play!

R R

You're moments away from getting into the game!

Click here for help

Check Remember my choice and click Launch Application in the dialog box above to join games faster in the future!

Gameplay sponsored by:
Loading 0% - Starting game...
Get more with Builders Club! Join Builders Club
Choose Your Avatar
I have an account
generic image