Presenting Hexbuffers, a platform independent library for dealing with binary data

Note: HexBuffers is a part of Hexlicense and ships with the product.

In the previous update we shipped an auxiliary unit called FMX.hexbuffers.pas. This was to give our customers early access to some of the features in store for HexLicense. Well, a lot has happened since the last update and we want to give you a quick update on what to expect, and more hands-on information about HexBuffers.

First of all, Ironwood for Smart Pascal is now ready! It will ship with the next update and is free for all existing customers. It allows you to use Smart Mobile Studio to write node.js based servers that delivers licensing solutions for your applications. Node.js gives you plenty of perks out of the box, like being scalable, capable of running cluster mode (where high performance is achieved by dividing the payload between several server instances) and much, much more.

nodejs-2-562x3092x-op
Host your licencing solutions anywhere on any platform with node.js

Node.js is also fairly universal and can be honest on any computer system regardless of operating system. Code is also portable, meaning that unless your code use some very platform specific third-party libraries – you can move your servers and service between Windows, Linux and Unix without any changes needed.

Speed is also phenomenal. We were unable to detect any speed advantages for Delphi when it comes to node.js. In fact, node.js performs better in most cases than multi-threaded Indy based servers. This has to do with the way node.js and JavaScript in general is architected. No other system in the world has seen more development by a diverse range of programmers than the v8 JavaScript virtual machine, and its accompanying webkit HTML5 rendering engine. Both Apple, Google and thousands of freelance developers have contributed. Webkit is the de-facto browser engine for Android, iOS, OS X and the majority of embedded systems out there.

Streams and their shortcomings

processStreams are considered industry standard. Every programming language has them, be it basic dialects, C++ or object pascal. They provide a clear-cut interface for reading, writing and dealing with binary data.

What streams lack however, are operations. The kind of operations you will sooner or later have to deal with when it comes to working with complex binary data.

Take something simple, like an insert() function. This has been lacking in Delphi since the very beginning. When you write data to a stream, it is either appended or overwriting existing content (depending where the position cursor is).

Imagine for instance that you want to insert 1 megabyte of data into a file 6 gigabytes in size. How will you do this? Remember you are inserting into the middle of the file, not appending at the end. And we are not talking about a simple write operation, you are not allowed to overwrite any of the existing data. An insert() operation must inject the data without overwriting or causing damage to a single byte.

Such an operation needs a few steps:

  • Grow the file by 1 megabyte
  • Copy the data below the destination mark forward by 1 megabyte
  • Write the injected data to the destination mark

You may be wondering what type of scenario would demand that you inject a megabyte of data into a huge 6 gigabyte file, but that is beside the point. HexLicense deals with license data, data that must be protected and made hard to obtain or extract. Being capable of hiding license data inside other files should be considered a basic operation.

But inserting is not the only thing lacking from ordinary streams. Delete or some form or remove() operation is equally important. And equally missing. Let’s say you need to delete 1 megabyte from the same file, perhaps it’s a record in your own database system, perhaps its a license that is invalid — there can be 100 reasons to why you need to perform this operation. But Delphi’s TStream class simply have no such complex features.

But it doesn’t stop there. Duplicating data, injecting, copying and pasting, performing live compression, encryption or downloading content directly from the internet. These are all things that can be done, but with a great deal of work – but it’s not a ready to use solution in either Delphi or C++.

THexBuffer

Disk or memory, it matters not
Disk or memory, it matters not

THexBuffer is an abstract class. It implements a range of methods similar to TStream, but it has more features than streams in any language can offer. What is important to underline is that there is no difference between memory based buffers or disk buffers. The classes implement the exact same behavior (just like streams do) regardless of medium. Considering the complex features buffers have to offer, that should make you thrilled.

In short: A buffer represents either a memory allocation or file allocation. It provides the same functionality and abilities to perform advanced and complex file operations on both, regardless of medium. But you are free to implement other mediums as you please.

In fact, you could easily inherit from THexBuffer and have it read and write data to a remote file on a server. If your server expose some basic file IO functions, the buffer methods would not care or notice that the data is actually read from a remote location. They are completely abstracted from the storage medium.

Bridging buffers and standard Delphi IO

Making buffers work seamless side by side with ordinary streams is naturally very important. Otherwise data would have to be copied, rendering all the low-level power of buffers impressive, but worthless in real-life software.

To make sure this is not the case we ship with a class called THexStreamAdapter. This is a virtual stream class (inheriting from TCustomStream) that delegates all stream operations to whatever buffer it is connected to.This is extremely useful when you have performed complex operations using HexBuffers – and now need to interface with Delphi’s standard IO mechanisms (read: classes expecting TStream rather than a buffer).

So you do not lose anything by using buffers, you gain a very rich and highly flexible framework for dealing with binary files and data.

What does a buffer class look like?`

Here is the baseclass (we removed some protected sections to save some space), THexBuffer, that all buffer types inherit from. This should give you some idea of the complexity the unit offers. As of writing, the unit is close to 7000 lines of code, all of it representing the means to work with binary data efficiently and objectively.

THexBuffer = class(TInterfacedPersistent)
Protected
  (*  Extended persistence.
      The function ObjectHasData() is called by the normal VCL
      DefineProperties to determine if there is any data to save.
      The other methods are invoked before and after either loading or
      saving object data.

      NOTE: To extend the functionality of this object please override
      ReadObject() and WriteObject(). The object will take care of
      everything else. *)
  function ObjectHasData: boolean;
  procedure BeforeReadObject;virtual;
  procedure AfterReadObject;virtual;
  procedure BeforeWriteObject;virtual;
  procedure AfterWriteObject;virtual;
  procedure ReadObject(Reader: TReader);virtual;
  procedure WriteObject(Writer: TWriter);virtual;
Protected
  (* Call this to determine if the object is empty (holds no data) *)
  function GetEmpty: boolean;virtual;
Protected
  (* Actual Buffer implementation. Decendants must override and
     implement these methods. It does not matter where or how the
     data is stored - this is up to the implementation. *)
  function DoGetCapabilities: THexBufferCapabilities;virtual;abstract;
  function DoGetDataSize: Int64;virtual;abstract;
  procedure DoReleaseData;virtual;abstract;
  procedure DoGrowDataBy(const Value: integer);virtual;abstract;
  procedure DoShrinkDataBy(const Value: integer);virtual;abstract;
  procedure DoReadData(Start: Int64; var Buffer;
      BufLen: integer);virtual;abstract;
  procedure DoWriteData(Start: int64; const Buffer;
      BufLen: integer);virtual;abstract;
  procedure DoFillData(Start: Int64; FillLength: Int64;
            const Data; DataLen: integer);virtual;
  procedure DoZeroData;virtual;
Public
  property  Empty: boolean read GetEmpty;
  property  Capabilities: THexBufferCapabilities read FCaps;
  property  Size: int64 read DoGetDataSize write SetSize;

 {$IFDEF BR_SUPPORT_ZLIB}
 property   ZLibDeflateEvents: THexDeflateEvents read FZDEvents write FZDEvents;
 property   ZLibInflateEvents: THexInflateEvents read FZIEvents write FZIEvents;
 {$ENDIF}

  procedure Assign(Source: TPersistent);Override;

  (* Read from buffer content to memory *)
  function  Read(const ByteIndex: int64; DataLength: integer; var Data): integer;overload;

  (* Write to buffer content from memory *)
  function  Write(const ByteIndex: int64; DataLength: integer; const Data): integer;overload;

  (* Append data to end-of-buffer from various sources *)
  procedure Append(const Buffer:THexBuffer);overload;
  procedure Append(const Stream:TStream);overload;
  procedure Append(const Data;const DataLength:integer);overload;

  (* Fill the buffer with a repeating pattern of data *)
  function  Fill(const ByteIndex: int64;
            const FillLength: int64;
            const DataSource; const DataSourceLength: integer): int64;

  (* Fill the buffer with the value zero *)
  procedure Zero;

  (*  Insert data into the buffer. Note: This is not a simple "overwrite"
      insertion. It will inject the new data and push whatever data is
      successive to the byteindex forward *)
  procedure Insert(const ByteIndex: int64;const Source; DataLength: integer);overload;
  procedure Insert(ByteIndex: int64; const Source:THexBuffer);overload;

  (* Remove X number of bytes from the buffer at any given position *)
  procedure Remove(const ByteIndex: int64; DataLength: integer);

  (* Simple binary search inside the buffer *)
  function Search(const Data; const DataLength: integer; var FoundbyteIndex: int64): boolean;

  (*  Push data into the buffer from the beginning, which moves the current
      data already present forward *)
  function  Push(const Source; DataLength:integer):integer;

  (* Poll data out of buffer, again starting at the beginning of the buffer.
     The polled data is removed from the buffer *)
  function  Pull(Var Target; DataLength:integer):integer;

  (* Generate a normal DWORD ELF-hashcode from the content *)
  function  HashCode:Longword;virtual;

  (* Standard IO methods. Please note that these are different from those
     used by persistence. These methods does not tag the content but loads
     it directly. The methods used by persistentse will tag the data with
     a length variable *)
  procedure   LoadFromFile(Filename: string);
  procedure   SaveToFile(Filename: string);
  procedure   SaveToStream(Stream: TStream);
  procedure   LoadFromStream(Stream: TStream);

  (* Export data from the buffer into various output targets *)
  function    ExportTo(ByteIndex: Int64; DataLength: integer; const Writer: TWriter):integer;

  (* Import data from various input sources *)
  function    ImportFrom(ByteIndex: Int64; DataLength: integer; const Reader:TReader):integer;

 {$IFDEF BR_SUPPORT_ZLIB}
  procedure   CompressTo(const Target:THexBuffer);overload;
  procedure   DeCompressFrom(const Source:THexBuffer);overload;
  function    CompressTo:THexBuffer;overload;
  function    DecompressTo:THexBuffer;overload;
  procedure   Compress;
  procedure   Decompress;
  {$ENDIF}

  (* release the current content of the buffer *)
  procedure   Release;
  procedure   AfterConstruction;Override;
  procedure   BeforeDestruction;Override;

  (* Generic ELF-HASH methods *)
  class function ElfHash(const Data; DataLength: integer):longword;overload;
  class function ElfHash(const Text: string): longword;overload;

  (* Kernigan and Ritchie Hash ["the C programming language"] *)
  class function KAndRHash(const Data; DataLength: integer): longword;overload;
  class function KAndRHash(const Text: string): longword;overload;

  (* Generic Adler32 hash *)
  class function AdlerHash(const Adler: Cardinal; const Data; DataLength: integer): longword; overload;
  class function AdlerHash(const Data; DataLength: integer): longword;overload;
  class function AdlerHash(const Text: string): longword;overload;

  class function BorlandHash(const Data; DataLength: integer): longword;overload;
  class function BorlandHash(const Text: string): longword;overload;

  class function BobJenkinsHash(const Data; DataLength: integer): longword;overload;
  class function BobJenkinsHash(const Text: string): longword;overload;

  (* Generic memory fill methods *)
  class procedure Fillbyte(Target:Pbyte;
        const FillSize:integer;const Value: byte);

  class procedure FillWord(Target:PWord;
        const FillSize:integer;const Value: Word);

  class procedure FillTriple(dstAddr:PHexTriplebyte;
        const inCount:integer;const Value: THexTriplebyte);

  class procedure FillLong(dstAddr:PLongword;const inCount:integer;
        const Value: Longword);
end;

As you can see THexBuffer offers some very handy functions. It must also be stressed that the entire unit is written from scratch, every method is implemented by us – including the hashing methods (these are also present in Delphi, but not like this).

Performing in place compression

Compression made easy
Compression made easy

Let’s say you have a stream with some data that you want to compress. In Delphi that means creating a target stream for the result, then you must create and use a stream compression instance to glue source and target together, before you perform a copy operation that actually makes use of the compression routine. While this may be well abstracted and future-proof, we fill it’s overkill in the extreme for such a simple thing. The only option in question should really be compression mechanism, but since Delphi only ships with ZLib out of the box – we even refactored that out.

So this is now a one-liner:

FBuffer.Compress();

This naturally has its matching opposite in the UnCompress() method.

You can also compress and emit the data in a different buffer, using the CompressTo() method (and it’s corresponding DecompressTo() method). THexBuffer gives you a lot of input options, be they streams, direct memory access or even TReader. We have tried to make buffers as useful and simple to deploy as possible.

In memory records

The unit ships with a class called THexRecord, a class that has a number of field-classes associated with it. This class essentially represents an in-memory record. But fully object-oriented. So you create a record by adding fields manually, or allowing the object to automatically create them on demand for you. It will check if a field exists by name when you write to it, if it does not – it can create the field for you.

This class and it’s child classes are exceptionally handy when reading with custom file-formats. For example, when you writing a header to your own database file-format. Not to mention working with networking and raw, binary packets. Such code quickly turns into a mess, but with THexRecord you get to approach the task from a purely object-oriented viewpoint.

Write your own database engine

The Hexbuffers unit has actually been used to create a fully operational, 100% Delphi only flat-file database (amongst many other things). There is a class called THexPartsAccess that allows you to access a file as fixed-size pages. Essentially a page represents a chunk or block of binary data. A database record can require several pages to be stored (normally a database page is 1024 or 4096 bytes in size), typically such a page has a “next” field, pointing to the ID number of the next page making up a record.

Roll your own, it's not that hard
Roll your own, it’s not that hard

You typically use a bit-map, a range of bits, each representing a block, to keep track of what blocks are free and what blocks are in use by records. So when you delete a record you don’t have to shrink the file each time (that would be very time-consuming), you simply mark the blocks as un-used so they can be recycled. You then isolate a full cleanup in a procedure like Compact() or similar.

So writing complex software is not hard when you got the right tools. And HexBuffers will give you a lot of interesting new possibilities for working with licensing and where/how you store your license.

It is also a solid foundation to build more and more complex license behavior. It is platform independent, framework independent and will run just as fine on iOS as it will on a Linux server.