Perfect Compression

Discussion in 'General Science & Technology' started by kmguru, Jan 16, 2002.

Thread Status:
Not open for further replies.
  1. kmguru Staff Member

    Messages:
    11,757
    WASHINGTON -- Physicists do not question the laws of thermodynamics. Chemistry researchers unwaveringly cite Boyle's Law to describe the relationship between gas pressure and temperature.

    Computer scientists also have their own fundamental laws, perhaps not as well known, but arguably even more solid. One of those laws says a perfect compression mechanism is impossible.

    A slightly expanded version of that law says it is mathematically impossible to write a computer program that can compress all files by at least one bit. Sure, it's possible to write a program to compress typical data by far more than one bit -- that assignment is commonly handed to computer science sophomores, and the technique is used in .jpg and .zip files.

    But those general techniques, while useful, don't work on all files; otherwise, you could repeatedly compress a .zip, .gzip or .sit file to nothingness. Put another way, compression techniques can't work with random data that follow no known patterns.

    So when a little-known company named ZeoSync announced last week it had achieved perfect compression -- a breakthrough that would be a bombshell roughly as big as e=mc2 -- it was greeted with derision. Their press release was roundly mocked for having more trademarks than a Walt Disney store, not to mention the more serious sin of being devoid of any technical content or evidence of peer review.

    A Reuters article was far more credulous, saying in the lead paragraph that "a Florida research startup working with a team of renowned mathematicians said on Monday it had achieved a breakthrough that overcomes the previously known limits of compression used to store and transmit data."

    For compression buffs, responding to such assertions ranks somewhere between a teeth-gnashing migraine and a full-contact sport.

    The comp.compression FAQ has an entire section devoted to debunking: "From time to time some people claim to have invented a new algorithm for (perfect compression). Such algorithms are claimed to compress random data and to be applicable recursively; that is, applying the compressor to the compressed output of the previous run, possibly multiple times."

    Several comp.compression fans have even offered rewards up to $5,000 for independently verifiable proof of perfect compression. They've never been claimed.

    Perfect compression, or even compression of a few hundred times -- what ZeoSync claims -- would revolutionize the storage, broadband and digital entertainment industry. It would mean that modems would be as fast as DSL, and DSL speeds would be blinding. A 40-GB hard drive would hold a terabyte, and so on.

    Link: http://www.wired.com/news/print/0,1294,49599,00.html
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. Rick Valued Senior Member

    Messages:
    3,336
    Is this technique based on Shanon Fano or maybe Huffman coding?

    by the way dont you think with arrival of streaming techno,compression is losing its value?

    bye!
     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. kmguru Staff Member

    Messages:
    11,757
    NO

    NO, we still need place to store and backup data. A company like Sears keeps 30 TB on line. If we can reduce that to say 100 GB - talk about money saved!

    You could buy an 80 GB drive and fill it up (1000:1) with everything you ever wanted including audio, video and whatever. We have about a 1000 video tapes full of stuff recorded from television. If I can compress that to one hard drive - talk about cheap storage and quick scan.

    A place like IRS or NSA stores petabytes of stuff. What a nightmare to manage on tapes...
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. SeekerOfTruth Unemployed, but Looking Registered Senior Member

    Messages:
    358
    It's one thing to be able to compress data 1000 to 1 or better.

    It is a completely different thing to be able to do that type of compression with low latency. The latency any compression technique introduces is the limiting factor in real-time communications.
     
  8. kmguru Staff Member

    Messages:
    11,757
    Let us see, what comes out of that company. Hope it is not like the Cold Fusion....
     
  9. James R Just this guy, you know? Staff Member

    Messages:
    39,421
    It is a logical impossibility for any program to compress every file by at least one bit.

    Imagine we had a program that could do that. Suppose we start off with a file 10 bits long. We run it through the compressor, and out pops a 9 bit file. Again, and we have an 8 bit file. Keep going and we end up with a 1 bit file. The contents of that file can only be a single 0 or 1.

    Now we want to decompress the file back to the original 10 bits. Obviously, it is impossible, since we have only 1 bit to work with. There's simply no way to store 10 bits of information in 1 bit.
     
  10. gnormhurst Registered Member

    Messages:
    1
    This is a great example. More to the point, consider that there are 1024 possible different 10-bit files. Compress all of them down to one bit as you say. There are only two possible one-bit files. When I decompress one of these back to 10 bits, which 10 bit file will I get? I can't get all of them. Thus it is not possible to compress any (every) file by one bit. Two files cannot uniquely map back to 1024.
     
  11. Rick Valued Senior Member

    Messages:
    3,336
    Site an example of such type of compression.

    bye!
     
  12. Raimon Registered Member

    Messages:
    13
    Perhaps one just needs to stop 'thinking digital' to understand new approaches like this...
     
  13. Rick Valued Senior Member

    Messages:
    3,336
    James R,
    hi,wont step by step decompression do the trick?



    BYE!
     
  14. Rick Valued Senior Member

    Messages:
    3,336
    Hi KM, a little explicit pointer

    Please Register or Log in to view the hidden image!


    If you"re talking about lossless compression,then this has been done,as in from Huffman and Shannon Fano Algorithms of Bell labs.

    bye!
     
  15. Zenith Registered Member

    Messages:
    3
    I'm reserving judgement on this until someone either proves or disproves it.

    My take on an efficient compression system is something that I saw on "Tomorrow's World" about 8-10 years ago. At the time they were talking about using fractal encoding.
    Take an image for example. As far as a computer is concerned, it is a series of pixels that change their RGB value from the previous pixel. This fractal encoding procedure would find a fractal equation that describes the changes in each pixel. Once you had that equation, you could apply "standard" compression techniques on it to achieve a very small file in comparison to the original image. Uncompressing the file and then solving the equation would provide you with a faithful reproduction of the original image.

    I've not really kept an eye on research in recent years so it might be that it is in general use already.
    In fact while doing a bit of background reading, I found this helpful FAQ... http://www.faqs.org/faqs/compression-faq/part1/section-15.html

    List of links can be http://links.uwaterloo.ca/other.sites.html
     
  16. hodonkain Registered Member

    Messages:
    4
    Why is this topic in the scifi page? There is nothing Science Fiction about this.
    This is Science Fact. Also what do people mean when they say "Random Data"?

    My program that I made doesn't care about the file file-type nor what data
    the file contains. Numers are numbers. People are so silly.

    I programmed an application that creates a file 1 megabyte big and does so
    using randomly generated numbers, and still my compression program works
    just fine.

    I like to think that TRUE non redundant data can append with more
    non redundant data and still be considered non redundant alltogether.
    It's similar to prime numbers in a way, as well as stable atoms.
    Think abstractly people.

    Please Register or Log in to view the hidden image!

    Remember that theres always another way to
    solve a problem. When your instructions to compress a file are larger than
    the file itself, then you definetly are close to the file's approximate
    minimum size.

    As far as "perfect" compression, I'll never say that it's impossible.
    Because it is possible. The word Perfect means so many things to so many
    people. Just keep an open mind. Its possible to remove or even minimize
    all known patterns within a file and call it compressed. The file's size would
    be tiny, and that compression would be great yet still possible.

    When it comes to digital data, many things are possible that aren't in the
    real world. As the great Yoda says, "You must unlearn what you have
    learned."

    Please Register or Log in to view the hidden image!

    Have a good one yaw...
     
  17. Bagman Registered Senior Member

    Messages:
    31
    // Why is this topic in the scifi page? There is nothing Science Fiction about this. This is Science Fact. //

    I have no idea. I don't think I've posted to this forum in years. You must be replying to something I posted years ago. (I'm writing this message offline, based on an email notification I received.)

    // Also what do people mean when they say "Random Data"? //

    Random data has no predictable pattern.

    // My program that I made doesn't care about the file file-type nor what data the file contains. Numers are numbers. People are so silly. I programmed an application that creates a file 1 megabyte big and does so using randomly generated numbers, and still my compression program works just fine. //

    So you wrote a compression program? If you're getting good compression, the data isn't random.

    // I like to think that TRUE non redundant data can append with more non redundant data and still be considered non redundant alltogether. //

    That's true, but not if the appended data is redundant with respect to the data it's appended to. For example, if you took 1K of non-redundant data and appended to it a copy of itself, the resulting 2K would be redundant.

    // It's similar to prime numbers in a way, as well as stable atoms. Think abstractly people. //

    No, it's not like the prime numbers, because they are extremely non-random, or redundant.

    //

    Please Register or Log in to view the hidden image!

    Remember that theres always another way to solve a problem. When your instructions to compress a file are larger than the file itself, then you definetly are close to the file's approximate minimum size. //

    No, that's not true. For example, PKZIP is about 50K, but it can compress a 20K file of English text to about 5K.

    // As far as "perfect" compression, I'll never say that it's impossible. Because it is possible. The word Perfect means so many things to so many people. Just keep an open mind. Its possible to remove or even minimize all known patterns within a file and call it compressed. The file's size would be tiny, and that compression would be great yet still possible. //

    "Perfect compression" means compressing every possible file by at least 1 bit. It's impossible, for a very simple reason: The number of all possible files of length N bits is greater than the number of all possible files of length shorter than N bits, so at least two different files would be the same after compression.
     
  18. hodonkain Registered Member

    Messages:
    4
    Thanks for responding to my information.
    Your reply is very clear and consise, thank you.
    I didn't mean every little detail as set-in-stone but just an example to get
    some viewers to question the idea for themselves instead of listening to
    regurgitated information from other people.

    It's much better to figure it out with creative ideas.
    Conformity destroys creativity. - I had to slip that quote in there.

    Please Register or Log in to view the hidden image!



    // "Perfect compression" means compressing every possible file by at least 1 bit.
    // it's impossible.

    You're right. Mathematically it is impossible. I'm in complete agreement with you.

    If you definition of "Perfect Compression" existed,
    That would mean that if you compressed all files down to 1 byte, when you
    go to decompress it, there would be nothing to distinguish between 2
    different files that have been compressed. Again I agree with you.

    I've been working on my compression program for about a year now.
    Its much better than 7zip and beter than cab and all the other compression
    methods I've come across. I wrote it in DirectX. Pretty fun actually.

    My program isn't "Perfect" per say but It removes all redundant data
    throughout a file doing several passes. It takes an extremely long time.
    The idea isn't new but I've taken a little different approach.

    I'm still not clear about what people mean by random data. Like I said, I set
    a seperate program up to create a big file that generates and adds random
    bytes to a file. And my compression program compresses the file just fine.

    Even though theres no real such thing as "Truely random" I do my best.

    Please Register or Log in to view the hidden image!



    Any replys to this message are always welcome. It's great to finally find
    some smart people to chat with.
     
    Last edited: Mar 14, 2006
  19. Bagman Registered Senior Member

    Messages:
    31
    // [Bagman] "Perfect compression" means compressing every possible file by at least 1 bit. It's impossible. //

    // [hodonkain] You're right. Mathematically it is impossible. I'm in complete agreement with you. If you definition of "Perfect Compression" existed, That would mean that if you compressed all files down to 1 byte, when you go to decompress it, there would be nothing to distinguish between 2 different files that have been compressed. Again I agree with you. //

    Talking about compressing every possible file to 1 byte just makes the point clear, of course. You couldn't even compress every possible file to 1 bit shorter than the originals.

    // I've been working on my compression program for about a year now. Its much better than 7zip and beter than cab and all the other compression methods I've come across. I wrote it in DirectX. Pretty fun actually. //

    Why or how DirectX? DirectX is for improved performance etc. with games vis-a-vis video, sound, etc. Isn't it just an API for that? Then how would you write a compression program "in" DirectX?

    I don't know what "7Zip" is. I'm out of date on all this, but advanced programs I've seen don't outperform PKZip or WinZip by much. It's very hard to do, because there are diminishing returns. The so-called arithmetic encoding beats them, probably, but it's very CPU-intensive. Do you know about LZW (Lev-Zimpel-Welch) compression? PKZIP and others use variations of that.
    _
    // My program isn't "Perfect" per say but It removes all redundant data throughout a file doing several passes. It takes an extremely long time. The idea isn't new but I've taken a little different approach that all the other comrpession programs don't seem to do. //

    Maybe they don't do it because it takes such a long time.

    // I'm still not clear about what people mean by random data. Like I said, I set a seperate program up to create a big file that generates and adds random bytes to a file. And my compression program compresses the file just fine. //

    If you merely add some random bytes to a file that has redundancy, you don't necessarily reduce the redundancy by much. It depends on how many you add and where you add them. If you merely append them, you just get some non-redundant stuff at the end; the rest of it still compresses as before. Even if you embed them, you can retain a lot of redundancy, depending on how many you embed, and where they're embedded. Based on what little you've said, I don't even see the point of adding or embedding random bytes.

    Randomness is actually a difficult idea. *Every* string is random in the sense that it could have been produced by chance. I suppose a string is random if all codes are equiprobable at every position in the string.

    You can't compress random data, or only very slightly.
     
  20. hodonkain Registered Member

    Messages:
    4
    I was writing it in delphi,
    and uses C++ and DirectX as a platform. It's a newbie language but still
    effective and fun to work with. Yes, I'm sure the amount of time my
    program takes is probably the reason why it's not commonly used today.
    Many people simply don't want to wait. For me, I don't mind.
    The uncompression stage is normal though, it's just the compression that takes
    a long time.
     
    Last edited: Apr 2, 2006
  21. AntonK Technomage Registered Senior Member

    Messages:
    1,083
    Why don't you go ahead and describe your algorithm. At the very least we can try to give you what the asymptotic running time would be. Big-O, Big-Omega, etc.

    -AntonK
     
  22. GMontag Registered Senior Member

    Messages:
    85
    When you are talking about the theoretical limits of compression, a "random" number is one that is completely or nearly completely entropy. No discernable patterns. Such a random number could be used as the source of random numbers of the other sense (a series of numbers with no discernable correlation between one number and the next).
     
  23. hodonkain Registered Member

    Messages:
    4
    I think people mistake nonredundant data for random. They aren't the same thing. Even though something is random, it still can create patterns unintentionally.

    Assuming I have an algorithm, I don't know what it would be. I tend to do things differently than other people. If anything new is going to come out of all this, I have to do things in a way that haven't been done before.

    To create something new I have to think outside the box. Sorry, I'm drinking a little bit while I'm typing. lol
     
    Last edited: Mar 15, 2006
Thread Status:
Not open for further replies.

Share This Page