|
| 21 Feb 2015 05:02 AM |
I need to shrink the size of plain English text a lot- so I basically need to compress the text.
We aren't talking 5, 10, 20 characters- I mean tens of thousands.
What's a way to do this? I can't lose a single character in the process, so I'll need to be able to compress and decompress the text.
The time it takes to compress and decompress the text is pretty irrelevant.
I was thinking of parsing the string and having each word set to a number. Now, the average word is 5 letters. Now, this would increment by 1.
With numbers, assuming 5,000 words with the average 5 letters, it would be 18,893 + 5,000 (for spaces) = 23,893.
Without doing this, it would be 25,000 characters, and with the 5,000 spaces, it would be 30,000 characters.
So I would save 6,107 characters in this process.
That's actually a pretty big deal, because one of the things this will be involved with is datastore interaction, and the data limit for character length is 64,998. However after 10,000 words it becomes sort of useless, because the number of characters to encode (5 on average) will start being equal to the number of characters in the number (5 digits).
It isn't an option to make multiple requests.
This is really important. |
|
|
| Report Abuse |
|
|
chimmihc
|
  |
| Joined: 01 Sep 2014 |
| Total Posts: 17143 |
|
|
| 21 Feb 2015 05:10 AM |
i dont know
compress the text
then split it into easy to manage amounts
then place them in a table
meh |
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 05:12 AM |
can't split the text up, it'll all have to come together in the end so it's pointless
i need something which will make it have less characters than it does
i also noticed a major flaw in method i mentioned in the OP: how the hell am i supposed to keep track of what words == what numbers...without saving the matching index for words = numbers which would end up being against the whole point
i need an algorithm |
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 05:13 AM |
What is the limit on DataStore for numbers? Perhaps you can convert to ascii and back:
local ascii = {}
function ascii.byte(s) local str = "" for i = 1,#s do str = str .. s:sub(i,i):byte() .. " " end return str end
function ascii.char(s) local str = "" for num in s:gmatch("%S+") do str = str .. string.char(num) end return str end |
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 05:14 AM |
NOTE TO SELF: What I *COULD* do is make a table of every English word in regular use (so a lot of them) and assign numbers to that...though still probably pretty bad
"What is the limit on DataStore for numbers? "
There's no limit based on the type of variable, it's just a data limit of a bit more than 64,000 characters. |
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 05:16 AM |
Well if I took purely the most popular words in the English language I could probably save some characters.
For example, if I assigned 1 to "to" and 2 to "the" and 3 to "for" and 4 to "we" and a few others like that, I wonder how many characters that would save. It's sort of hard to tell. :/ |
|
|
| Report Abuse |
|
|
chimmihc
|
  |
| Joined: 01 Sep 2014 |
| Total Posts: 17143 |
|
|
| 21 Feb 2015 05:17 AM |
| might i ask what you're doing? |
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 05:20 AM |
"might i ask what you're doing?"
it's for a pretty big project
I'm making a command line interface, but it already has a file system, etc. (it's a wild comparison but think of Linux), and this base will be able to be manipulated to be used as a utility for productivity and time saving
it's hard to explain
as part of this there's going to be *****************************lots**************** of text to save, and since this is meant to be a sideline utility i need to minimize my datastore request count per minute (as it will take away from the main game), and it's going to be high enough as it is |
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 05:28 AM |
hmm
function Compress(s) local Word_For_Number = {} local Num_Items = 0 local Compressed = "" for word in s:gmatch("%S+") do if not Word_For_Number[Num_Items] then Num_Items = Num_Items + 1 Word_For_Number[word] = Num_Items Compressed = Compress .. tostring(NumItems) else Compressed = Compress .. tostring(Word_For_Number[Num_Items]) end end return Compressed, Word_For_Number end
function Decompress(str, Word_For_Number) local Decompressed = "" for i = 1,#str do for ii,v in pairs(Word_For_Number) do if v == str:sub(i, i) Decompressed = Decompressed .. ii end end end return Decompressed end
local compressed = Compress("Hello World, how's it going, ha ha ha, this is funny, YEAH!, oh boy, what's happening.! Grammerz mistakesz")
print(compressed) print(Decompress(compressed))
--Sure this may not be the most efficient, but I'm giving you a general example of how you can do this.
--No guarentee it works, wrote on the fly. |
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 05:29 AM |
Oops...
local compressed, Word_Table = Compress("Hello World, how's it going, ha ha ha, this is funny, YEAH!, oh boy, what's happening.! Grammerz mistakesz")
print(compressed) print(Decompress(compressed, Word_Table)) |
|
|
| Report Abuse |
|
|
| |
|
|
| 21 Feb 2015 05:40 AM |
Bug fixes:
function Compress(s) local Word_For_Number = {} local Num_Items = 0 local Compressed = "" for word in s:gmatch("%S+") do if not Word_For_Number[Num_Items] then Num_Items = Num_Items + 1 Word_For_Number[word] = Num_Items Compressed = Compressed .. tostring(NumItems) else Compressed = Compressed .. tostring(Word_For_Number[Num_Items]) end end return Compressed, Word_For_Number end
function Decompress(str, Word_For_Number) local Decompressed = "" for i = 1,#str do for ii,v in pairs(Word_For_Number) do if v == str:sub(i, i) then Decompressed = Decompressed .. ii end end end return Decompressed end
local compressed, Word_List = Compress("Hello World, how's it going, ha ha ha, this is funny, YEAH!, oh boy, what's happening.! Grammerz mistakesz")
print(compressed) print(Decompress(compressed, Word_List)) |
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 05:42 AM |
well uh it didnt error
print(compressed) print(Decompress(compressed, Word_List)) nilnilnilnilnilnilnilnilnilnilnilnilnilnilnilnilnilnil
|
|
|
| Report Abuse |
|
|
|
| 21 Feb 2015 06:00 AM |
sorry I didn't answer your party I was AFK and when I got back it was over, it glitched out when you went offline for a bit to go BRB
i gotta sleep for now im tired asf |
|
|
| Report Abuse |
|
|