Sunday, December 29, 2013

Going safe, Secure forwarding

Requirements: MySQL back-end database, Apache and PHP 5.4+ installed and running

Inspired from the traffic control capability of Websense and AVG web security, my new coding partner, Pradeep, and I tried to provide you with a script doing an enhanced version of those tools' job. Indeed, the PHP script provides high quality security to outbound traffic, by warning the user of the reputation of the outside website and the potential dangers (if any) s/he is exposed to by visiting this last.
It has been designed to protect a local user of a web application from having confidential data (cookies, credit card secret code,...) being stolen via known hacking techniques like XSS.

1) How it works: The algorithm

Consider the following scenario:



Suppose all the unknown links on the page P1 are on the form :
<a href="p2.php?link=urlencode(xyz)">xyz</a>

So here is the scenario.
a) User is actually browsing on P1. S/he sees a link xyz and wants to visit that link without knowing the risk(s) s/he's exposing himself/herself to
b) When s/he clicks on it, s/he is sent to P2 to check that link first before being forwarded [Step 1 on the figure]
c) P2 open a session (you can make it secure with ssl if you want) with the BAP and check the purity/cleanness of xyz  [Step2 on the figure].The BAP can be anything: a database, an encrypted file or a customized server. It's up to you
Basically, The BAP stores a list of links that have bad reputations over internet due to their content (malware, botnets, pornography,..).
d) Once the search is completed in the BAP, the BAP returns a token to P2 whose value determines if that link xyz is clean or not.Then BAP also closes the session [Step 3 on the figure]
e) If the link has been classified as "dirty" with respect to the returned token, then P2 alerts the user on P1 that s/he's heading to a bad website xyz
[Step 4 on the figure]
{
  If the user acknowledges the risk and still wants to be forwarded, P2 forwards the user at his/her own risks and records the incidence in a log file [Step 5 on the figure]
  else P2 returns to P1
 }
otherwise the if the link has been classified as "clean" with respect to the returned token, then P2 simply forwards the user to xyz
[Step 5 on the figure]

2) Want to test it?

Download source
 
I know I was supposed to write about network design and configuration as promised in the last week's sneak peek. Some last minute unforeseen issues came through but don't worry,  I will post about it asap in 2014. Meantime enjoy the holidays and Happy New Year from the #CodijuanaTeam!

Sunday, November 17, 2013

Network X Web Servers: What if you could setup everything from scratch?

 You're a dummy programmer who's always been dealing with pre-configured hosting environment like CPanel et al. But this time, you want to take it to an upper level of control on the hosting, your own servers, your own configuration, your own security. Well, the following sequence of articles is for you. I will cut the chase later
but for now, you have to understand the network matrix which can be summarized in the following topics:  how internet works,  OSI model, TCP/IP,  IP addressing/subnetting, Routers, Firewalls and Quality of Service.

1) Digesting the network matrix in a few lines
       In order to easily get what follows, you have to keep in mind that network communication is pretty much similar to human communication, but in more conventional and adaptive manner. In any friendly talk (network), when you (device 1) talk(sending request) to another person (device 2) using your voice (data), you expect the person to reply (response via data). If that person doesn't want to hear what you're saying, he just has to clog his ears (firewall). Now, in bigger and complex conversations (internet) such as debates, conferences,.. there is a need of a moderator (router) to arbiter the communication between groups of people.
Now, let's go deeper by explaining the reverse: How machines communicate in a network. But before, I have to introduce you to a model called the Open Systems Interconnection (OSI) model that was built by the International Organization of Standardization (aka ISO) in 1984 to facilitate the understanding of this type of communication. You might ask yourself how? Well, let me fly over a sweet example that will enlighten you.
The OSI model simply formalizes the communication between machines over a network by splitting the communication flow into layers. The following conversation is a client-to-server PHP webpage request - http://www.example.com/index.php

Client Server (example.com)
Application layer (L7): 
The client  requests a web page by typing in a web address (a URL) in the application (browser)
Application layer (L7): 
The web server application (PHP from the LAMP* stack) processes the data and produce an (HTTP) response(4)
Presentation layer (L6): 
The request is formatted into HTTP generic request message (3)
Presentation layer (L6):
The request is formatted into HTTP generic response message (3)
Session layer (L5): 

The browser initiates an (HTTP) session by opening a TCP connection to the (HTTP) server with which it wishes to communicate.

Your web browser automatically opens additional TCP connections to the server and request those media

   -------------------------------- >>>






  <<< --------------------------------
Session layer (L5):

The server sends back the response / web page and closes the connection. 
                       |
                       |
Your web browser then parses the HTML of the web page to read the instructions (HTML tags) which tell the browser where to find additional files to be displayed within the web page such as style sheets, images,.... 
Transport layer (L4): 
The opened TCP connection(s) breaks up the request message(s) into managable chunks, labels them with numbers so they can be reassembled in the correct order and transport the pieces across the correct session
<<< ----------------------------- >>> Transport layer (L4): 
reorder the data stream if the incoming packets are out of order, multiplexing data in case of different flows to applications on the same hosts
Network layer (L3): 
Internet Protocol (IP) provides unique addresses for the web server and for your computer, then creates a message addressed to the web server and to be sent via your default gateway
<<< ----------------------------- >>> Network layer (L3): 
read each received packet to check if any matches any access restriction or filtering feature set at the computer firewall
Data layer (L2): 
Your computer uses ARP to figure out the physical MAC address of the default gateway and then passes the messages to the network card. Once there, each  chunked message is transformed into a network request/packet then forwarded to the Internet via the detected gateway
Data layer (L2): 
The network card reads the received packets and converts them into packets readable by your machine
Physical layer (L1): 
transmits the packets to the default gateway.
-------------------------------------------- Physical layer (L1): 
receives the transmitted packets and send to the network card

*LAMP: Linux-Apache-MySQL-PHP (5)

Such modeled separation can be applied to any type of computers' communication to understand or study this last; the only difficult part is to put the right element or process of the communication in the right layer.
However, the example above is an easy one and is only valid for a small number of machines. If you want a huge multiple computers' conversation like internet, things will get jammed and your network will look like a loopy bin.


Figure 1: Traditional devices communication

That's why, you will need a moderator that will work at the layer 3 (TCP/IP) to deal with the routing of information using the computers' IP. Such moderator is called a router. And what it does, is that it divides the multiple computers' conversation into sub-conversations. After incorporating routers, your network will look like this:


Figure 2: Modern devices communication

This schema (Figure 2), when projected at a global scale, is basically called internet.

Each router has a routing table to compute the next hop for a packet. The routing table stores the routes (and in some cases, metrics associated with those routes) to particular network destinations. Your routing table is created automatically, based on the current TCP/IP configuration of your Linux / UNIX computer. You can manually add / modify / edit routing table using route and ip command on Linux (6). In addition, some routers also provide security via NAT/DNAT, additional firewalls (to not be confused with the ones of your computer) and Quality of Service  as we'll see in our future articles while setting up our own environment.

Now you know that if you don't have valid destination IP address to match in the routing table, you might have trouble in your conversation (this is not new since it's the same scenario in human conversation :) ).

But:
- If the communication is only from IP to IP, how does my machine communicate via an URL in the browser? 
-How does it work inside each sub-network monitored by a router? 
-How about the interfacing between your computer and the router itself? 

Ok ok. One question at a time.
The communication over internet is indeed IP-to-IP. Your machine has no clue of what an URL is. At L7 (see the above OSI-modeled communication example), when the user provides the URL, your computer contacts your DNS servers to convert the supplied URL into an IP address, which is later interpreted by your device as a sequence of octets through a binary division. The following figure illustrated this translation process:
                   
                                   Browser
                        http://www.example.com
                                        DNS
                 74               125             228               17
                                      Device
   01001010    01111101  11100001  00010001

Figure 3: Interpretation sequence of a URL by a device within a network

 For Linux-users, you can find the IP addresses of your DNS servers in the file "/etc/resolv.conf". Those DNS servers' IPs are automatically configured by the router of the sub-network to which you belong. to

Now talking about sub-networks, two things to keep in mind are: private IP and public IP. At the early beginning, everybody used public IPs to communicate. After the arrival of routers, for security reasons and to prevent an eventual shortage of public IPs, private IPs had been integrated for internal communication within sub-networks. And then, when a request, sent from your machine at L7,  reached L3, your router would convert your private IP into public IP to be attached to each packet header. However, no matter the number of private IPs (hosts) in your sub-network, the public IP is the same.
The private IP is automatically assigned to the network card's interface of your device by the router of the sub-network to which you are connected and must follow the standards set by RFC 1918. Here is the range assigned private IPs depending on the class of your network.

IP address range  Number of addresses  Class of the network
10.0.0.0 - 10.255.255.255  16,777,216  class A
172.16.0.0 - 172.31.255.255  1,048,576  class B
192.168.0.0 - 192.168.255.255  65,536  class C

And yes, two machines from two (2) different sub-networks can have the same private IP; collision of private IPs is allowed as long as the machines belong to different networks.
Moreover, the structure of the private IP address reveals 2(two) major hidden lines: an identifier of your sub-network and an identifier for your machine. These lasts can be easily unmasked by reading another sequence called the subnet mask. For example, if the private IP of my wireless card's interface(wlan0) is 192.168.1.1 and my subnet mask is 255.255.255.0. We simply use an alignment reading technique to figure it out.
We stick both sequences and divide from  the 0-side of the subnet mask
    192.168.1.
255.255.255.
1
0
The red container is the network identifier and the green one is the identifier of the device on the network. So now, you can check yours by typing in the command line, ifconfig for Linux and ipconfig for Windows.

Another use of the subnet mask is in the design of network. For instance, let's say, after buying your router, you connect your laptop, and type ifconfig and notice your private IP is 192.168.1.0. Then, you speak to yourself and say: " I want to have a network/sub-network with 254 hosts". Using that amount, we can determine the subnet mask for your future network and thus the splitting above.
Remember (from Table 1) that an IP address is nothing more than a sequence of 4 bytes, that is 4 sequences of  8 bits.
 _ _ _ _  _ _ _ _     _ _ _ _ _ _ _ _    _ _ _ _ _ _ _ _   _ _ _ _ _ _ _ _ 

There's a formula to use to determine depending on the starting point: the willing number D of devices OR the willing number N of subnets/networks.
D = 2n- n,  N = 2n          
 where  n is the number of bits to skip before splitting
 For D, we count from the right and for N, we count from left  

So for the network we want to build,  it will be: 2n- n = 254 => n = 8
Hence our subnet will be:
                    
  11111111    11111111    11111111  _ _ _ _ _ _ _ _ 
     255        255      255           0 

Knowing that our subnet mask will be 255.255.255.0, we can now use this information with the targeted amount of hosts to allocate private IP addresses and their corresponding lease time to the network's devices via the DHCP  This process is the manual allocation of IP addresses. Despite its effectiveness, this method is only suitable for company-size network (for example, when allocating an IP address to a company's FTP server). For minor cases like home networks,  the IP allocation is usually done automatically by the router.

And because each sub-network holds a private conversation and the router is the one in charge converting your insider(private)IP into a public one X , the inside interface of the router must have a private IP; meantime its outside interface has the public IP address X.

2) Subsequent themes of this series
Now, you're ready to move forward. Please note that the implementations will be entirely done in a Linux environment.
- A1: Setting up an all-level ( From the modem to the DMZ via the router)  network (virtual and physical)
- A2: Designing and implementing firewalls and Quality-of-Service for your network
- A3: Configuring and securing the servers for your website - Case study: LAMP
(If you are a web developer)

3) Advanced work
A similar sequential approach can be used depending on the service provided by the network you are developing. In our case, it's a hosting network. But you can have a network configuration dedicated to supply cloud services (E.g: IaaS platforms like Amazon EC2), Storage services (E.g: Dropbox), Distributed delivery services (E.g: CDNs)...etc

4) References
1- Designing and Implementing firewalls and QoS with Linux using netfilter, iproute2, NAT and l7-filter, 2006, Lucian Gheorghe
2- http://www.inetdaemon.com/tutorials/basic_concepts/network_models/osi_model/OSI_model_real_world_example.shtml
3- http://www.tcpipguide.com/free/t_HTTPRequestMessageFormat.htm
4- http://www.tcpipguide.com/free/t_HTTPResponseMessageFormat.htm#Figure_318
5- http://en.wikipedia.org/wiki/LAMP_%28software_bundle%29
6- http://www.cyberciti.biz/faq/what-is-a-routing-table/
7- http://en.wikipedia.org/wiki/Private_network
8- http://www.tarunz.org/~vassilii/TAU/protocols/dhcp/ipaddr.htm

Thanks for your time and don't forget to comment and share this article, add me on G+ and subscribe to the blog's RSS feeds

Sunday, November 10, 2013

Case #3: Private trusted identity (RFC 3161)

Requirements: openSSL (v1.0.0+), PHP and Apache installed and running - valid allowance from a TimeStamping Authority (aka TSA), Exec and Curl extensions enabled in your PHP environment
Now, imagine you have personal files, group of files, secret contracts, explicit pictures, even HTTP headers and many more to which you can give a perfect secrecy or files that you can securely share, all of this with only one keyword: an identity
For instance, there is an exclusive movie coming out next week and you are the only one within the torrent social network KickAssTorrent who was able to have it. However you want to share it with your peers but at the same time, don't want to be caught by the media authorities. [ Yeah, I know ; as a security guy I'm suppose to give delightful examples, but let's be a bit silly for this week, shall we? ]. So you have to protect yourself by protecting the access to the torrent and your identity (keep it private). This is done by giving access to the torrent to only trusted peers.
Such achievement resides in the innovative cradle that we'll call the "Private Trusted Identity (ID for short)" and is brought to you by Codijuana, trademark of Landry Kouajiep.
 

(1) How does it works? 

While publishing the torrent of the brand-new movie, your identity (username or id in the database ...) is encrypted using the alphapad permutation cipher. Then, the torrent T is saved and a list of trusted users for this specific file T is created. Next, both the ciphered id and hash of T are pushed in an array that is serialized in a string which is also hashed.The resulting hash is used to request a RFC3161 timestamp from a TSA via openssl and curl. The TSA's response is collected and the timestamp is extracted. At the end, we have: the torrent file T, the list of trusted users (i.e people who can access T without validation), the timestamp and the TSA response.

Now if a peer wants to access that torrent (the torrent itself and the identity of its owner), he must provide his identity, the encrypted id of the torrent's owner and the hash of that torrent. Unfortunately for the media authorities or the attackers, the last 2 cited credentials are only accessible to peers that are trusted by the torrent's owner :P .

The theory behind this reasoning is the following: If you can prove you knew a file's hash and the encrypted id of its owner at a certain point in time, that guarantees you knew the file content and its owner.
 
(2) How is it different from the others?

I guess some of you will call it a "deja vu". Well, let me give 3 reasons that'll wipe that impression from your mind:

Better than the usual digital signature: Apart from verification as is the main purpose of digital signature, this method is an authentication-authorization system. For instance, instead of  only signing a Git commit on GitHub, we would also be able to control and monitor the access to that commit using only openssl and curl; no more exhaustful tracking codes; only certificates and the lightweight alphapad permutation cipher. Total empowerment of commits' security.

Deeper than passwords: It not only allows you to protect data layers (files, accounts,..) but can also be used to keep the communication layers safe (http header compression for instance)

More powerful than SSL: It has recently been shown that TLS can be broken in less than a minute using the BREACH attack which abuses the difference of size in the response bodies to infer the original response. The embedded alpha-padding permutation cipher takes advantage of its OTP's characteristics to provide the perfect secrecy and the control on the variation of the response bodies' sizes (inherited from the control on the padding parameter) which seriously gives more control on the variation of the BREACH's chances / probabilities of breaking the cipher.

(3) Enough talking. Can you show us the code and explain it?

Sure. I have to precise that the illustrated code is divided into two main sections respecting the above introduction to the concept. The first section covers "WHAT HAPPENS WHEN THE OWNER SAVES HIS DATA/FILE" while the subsequent takes in charge "WHAT HAPPENS WHEN ANOTHER USER TRIES TO ACCESS  THE DATA/FILE". Also the data in the first lines of codes of each section are only examples and can be changed at your own taste. However, for less trouble in your experiments, I recommend you the following:

- For an identity (the variables $my_id in section 3.1 and $provided_id in section 3.2) you should use characters from the alphabet you defined in the alphapadding permutation cipher. More details about this alphabet here.
- For the hash in section 3.2, please make sure it's 40 characters (letters and digits) long since I'm using sha1 here.You can change the hash algorithm too. But make sure to adjust the request to the TSA to match this change.

3.1 WHAT HAPPENS WHEN THE OWNER SAVES HIS DATA/FILE

<?php 

include_once "class.alphapad.php";
 //- (1) We 1st set user data and the system
$system_pad = 300;
$system_key = 90;

$permutation = new alphaPad($system_key,$system_pad);  
$my_id = "john smith"; //- The owner id provided by the system 
$my_id = $permutation->_encrypt($my_id); //- set private by encryption

$data = $my_id."_movie_name_".time();
$saved_file = $my_id."/".$data.".torrent"; //- the original document to protect
chmod($saved_file,600); //- to make sure nobody except the owner can access it
 
$trusted_ids = $my_id."/".$my_id."-list.txt"; //- and create a list of trusted people
$cmd = "grep '".$my_id."' ".$trusted_ids." > result-".$my_id.".out"; exec($cmd);
if (filesize("result-".$my_id.".out") == 0) file_put_contents($trusted_ids, $my_id."\n"); //- starting by the owner himself

chmod($saved_file,600);

 // Another way to build a trusted list is by using suPHP to chown files from here to the appropriate user/group and add trusted ids to the owner's group, but the effect will be that the web server no longer can read/edit those files once you change ownership of them.
 
$my_hash = sha1($data);
$data_string = serialize(array("hash" => $my_hash,"alpha_pad_id" => $my_id)); $fhash = sha1($data_string);

//- (2) Then request a timestamp using OpenSSL and capture the response
// The request is HTTP POST binary data in the header that is passed to the server using curl.
You'll need to have a bit of info for this:
//  URL - An RFC3161 time-stamping service (DFN-Verein in our case)
//  FHASH - The final hash (set up earlier) you want a timestamp for. Must be a full hash.

$cmd0 = '
CONTENT_TYPE="Content-Type: application/timestamp-query"
ACCEPT_TYPE="Accept: application/timestamp-reply"
URL = "http://zeitstempel.dfn.de" '
;
exec($cmd0);

$cmd1 = 'openssl ts -query -cert -digest '.escapeshellarg($fhash).' -sha1 \
    | curl -s -S -H "$CONTENT_TYPE" -H "$ACCEPT_TYPE" --data-binary @- "$URL" -o response.tsr '
;


//- What $cmd1 does is the following: we create a time stamp request
//- The output is a time stamp request that contains the SHA1 hash value of your data; ready to be sent to the TSA (DFN-Verein in our case - see the bottom of the article for more TSA).
//- Then after the "|" in the command,i.e once the request is ready, the curl program transmits your request to the DFN-Verein servers.
//- If an error occurs then the output file will contain information to help debug (see the parameter -S in the command).
//- Otherwise the output file (.tsr file) is the RFC3161 timestamp response of your file is returned


exec($cmd1." 2>&1", $array, $status);
if ($status !== 0) throw new Exception("OpenSSL does not seem to be installed: ".implode(", ", $array));
if (stripos($array[0], "openssl:Error") !== false) throw new Exception("There was an error with OpenSSL. Is version >= 0.99 installed?: ".implode(", ", $retarray));

//- (3) We now verify the response by extracting the timestamp and if valid, we save that response string/token

$cmd2 = "openssl ts -reply -in response.tsr -text";
$timestamp_response_array = execute_reply($cmd2, $my_id."_".$my_hash);
$matches = array();
$response_time = 0;
foreach ($timestamp_response_array as $retline){
  if (preg_match("~^Time\sstamp\:\s(.*)~", $retline, $matches)){
            $response_time = strtotime($matches[1]);
            break;   
  }
}

//- For a better understanding of this extraction, please take a look at the post following this section

if (!$response_time)throw new Exception("The Timestamp was not found");
echo "File and identity saved, safe and sound! \n

           Here are the credentials for your trusted peers:\n
           hashID:".$my_id."\n
           hashTorrent: ".$my_hash;

function execute_reply($command, $storage_name) {
    $retarray = array();
    exec($command." 2>&1", $retarray, $retcode);
    // 2>&1 appendix redirects the error-stream STDERR to STDOUT
    // all the STDOUTed lines are then sequentially pushed in the returned array $retarray
    // $retcode contains the return status of the executed command


    if ($retcode !== 0) throw new Exception("The reply failed: ".implode(", ", $retarray));
    else {
        //- We gather the response token in a file for future authentications
        $tmpfname = tempnam("/responses", $storage_name); //- tempnam will chmod the file to 600 i.e unalterable except by the owner of the file
        $save_cmd = "echo '".$command."' > ".$tmpfname."";
        unlink("request.tsq"); unlink ("response.tsr");
    }
    return $retarray;
}


During the request for a timestamp via the execute_reply(), the STDOUTed lines are the content chunks of the timestamp response that look like this:

    Using configuration from C:\OpenSSL-Win64\bin\openssl.cfg
    Status info:
    Status: Granted.
    Status description: unspecified
    Failure info: unspecified
    TST info:
    Version: 1
    Policy OID: 1.3.6.1.4.1.4146.2.2
    Hash Algorithm: sha256
    Message data:
        0000 – 58 df 63 8c 5b bf ff ca-ad 13 c9 6e 93 96 cd 25   X.c.[......n...%
        0010 - 66 5e f1 eb ba 8e 7f 74-6d 65 04 3c 5d ea e4 35   f^.....tme.<]..5
    Serial number: 0x2487F5EA8A5A085844ED68A8A7426E07E692E1BD
    Time stamp: Sep 17 05:08:38 2013 GMT
    Accuracy: unspecified
    Ordering: no
    Nonce: unspecified
    TSA: DirName:/C=SG/O=GMO GlobalSign Pte Ltd/CN=GlobalSign TSA for Standard – G1
    Extensions:

 

 In other words, the returned array $timestamp_response_array is simply the above splitted by line. The hash that response was requested for is : 58df638c5bbfffcaad13c96e9396cd 25665ef1ebba8e7f746d65043c5deae435 (see the line starting with: "Message data:..."), the used hash algorithm is sha256 and the TSA is GlobalSign.
Also for simplicity, we assume that the TSA servers that we're using have an availability of 99.9%. For a deeper consideration, you might want to consider the failover scenario and schedule a waiting list of alternate servers.
 /*
 * Summary:
 * INPUT:  data T, private identity
 * OUTPUT:  the timestamp for T, a trusted list and the response token from the TSA

 */

3.2 WHAT HAPPENS WHEN ANOTHER USER TRIES TO ACCESS THE DATA/FILE

Before crawling the validation procedure, let's have a brief chat about the certificate chain of an TS authority (let's called it A).

 * An authority A's certificate chain is a file that is used to trace back to
an offline cryptographic appliance called the root certificate authority (root CA) where all the private keys are kept in a safe and physically secured facility that meets the security, policy and operational practices that all CAs are mandated to meet. By installing such chain in your browser, your browser can tell if the certificate of A was issued by a root CA or not . The time-stamping service A should be signing your timestamp with a certificate that was issued by a root CA.  If not, your timestamp doesn't have much credibility (d). Its signature, at the bottom of its response, (For instance, the line starting by "TSA: DirName:/C=SG/...." in the example bove) indicates that the root CA believes that practices of the authority A below it meets that same high bar.
 To better understand the relationship between the root CAs and the intermediate authorities like A, I suggest you to take a look at the trust tree from the ICSI certificate notary.


How does a certificate chain look like?A certificate chain is a usually a certificate of extension .pem or a text file (.txt) and looks like this (courtesy of DFN-Verein): https://pki.pca.dfn.de/global-services-ca/pub/cacert/chain.txt

A certificate chain can be easily created using curl or provided to you by your TSA.
You can also download this bigger certificate chain (with more root CAs): http://curl.haxx.se/ca/cacert.pem


WHY AM I TALKING ABOUT AN AUTHORITY CERTIFICATE CHAIN? Because it's an important piece of the authentication procedure as we will se below.

<?php 

include_once "class.alphapad.php";
 $system_pad = 300;
$system_key = 90;

$permutation = new alphaPad($system_key,$system_pad);

$user_id = "carla joe"; //- user who is trying to access file T 
$user_id = $permutation->_encrypt($user_id);

$provided_id = 12232323434439999; //- should be the same as $my_id for the authentication to work
$provided_hash = "e47b8eef9eb2bedca76dcdd4041d3e1755e2324a"; //- should be the same as $my_hash for the authentication to work
 
$user_string = serialize(array("hash" => $provided_hash,"alpha_pad_id" => $
provided_id)); $user_hash = sha1($user_string);


$response_token = $
provided_id."_".$provided_hash;
$file_timestamp = $response_time; //- we suppose it has been saved in the system after the process in the precedent section

$tsa_cert_chain_file = "chain.txt"; //- the certificate chain from the TSA of our system
$trusted_ids =
$provided_id."/".$provided_id."-list.txt";

if (file_exists($trusted_ids)){
    $cmd = "grep '".$user_id."' ".$trusted_ids." > result-".$
provided_id."-".$user_id.".out";         

    exec($cmd);
   
    if (filesize("result-".
$provided_id."-".$user_id.".out") == 0) {
       $validated = validate($user_hash, $response_token, $file_timestamp, $tsa_cert_chain_file, $user_id);
       if ($validated == 1) {
            file_put_contents($trusted_ids, $user_id."\n");
            echo "You're in!";
        }
        else throw new Exception( "Hash mismatch! Please try again");
    }
    else echo "You're in!";
}
else {
     error_log_type(1, $user_id);
     throw new Exception("The file you are trying to access doesn't exist");   
}

function validate ($hash, $response_token, $response_time, $tsa_cert_file, $user) {
   
    if (strlen($hash) !== 40) {
         error_log_type(1, $user, "Provided file's hash");
         throw new Exception("Invalid Hash");
    }
    $response_file = './responses/'.$response_token;
    if (!file_exists($response_file)) {
        error_log_type(1, $user, "Name of the response token");
        throw new Exception("Invalid Hash or owner ID");
    }
    if (!intval($response_time)){
        error_log_type(1, $user, "Provided timestamp");
        throw new Exception("There is no valid response-time given");
    }
    if (!file_exists($tsa_cert_file)) {
        error_log_type(1, $user, "Path to the certiificate chain");
        throw new Exception("The TSA-Certificate could not be found");
    }

 $cmd = "openssl ts -verify -digest ".escapeshellarg($hash)." -in ".escapeshellarg($response_file)." -CAfile ".escapeshellarg($tsa_cert_file);
   
$array = array();
exec($cmd." 2>&1", $array, $status);
    /*
     *  Just 2 valid cases:
     *  1) Everything okay -> status 0 + array[0] == "Verification: OK"
     *  2) Hash is wrong -> status 1 + strpos(array[somewhere], "message imprint mismatch") !== false
     * Every other case (Certificate not found / invalid / openssl is not installed / ts command not known)  are being handled the same way -> retcode 1 + any retarray NOT containing "message imprint mismatch"
     */

  
    if ($status === 0 && strtolower(trim($array[0])) == "verification: ok")
       return 1;

    foreach ($array as $rline)
    {
        if (stripos($line, "message imprint mismatch") !== false)
           {error_log_type(2, $user, "Provided file's hash"); return 0;}
    }
   
    error_log_type(3, $user, implode(", ", $array));
    throw new Exception("System command failed: ".implode(", ", $array));
}

function error_log_type($severity, $user, $parameter){
    date_default_timezone_set('UTC');
    $log_file = "./error.log";
    switch ($severity){    
        case(1): $error = "Wrong information provided";break;
        case(2): $error = "Hash's print match failed";break;
        case(3): $error = "Fatal error - System command failed";break;
    }
    $error = "[ ".date('l jS \of F Y h:i:s A')." ] - ".$error." - [ Author: ".$user.", Parameter: ".$parameter."] \n";
    file_put_contents(
$log_file, $error);
}


/*
 * Summary:
 * INPUT: user's private identity, file T owner's private identity and the hash of the file T
 * OUTPUT: Access to file F containing T or not (reporting the user's trials)
 */

  
(4) Important notes
 [1] Before running the code, you MUST have  the allowance from the TSA (DFN in our case) to use their service. More about it here.

Download source

[2] Due to the presence of the subpadding in the alphapadding permutation cipher, the encrypted identity is subjected to identity collision in a case of a multiple-users system. A solution is, of course, to remove the subpadding substraction from the encryption, but this will increase the inference power of an attacker as explained in the "Extended Work" section of the precedent article. 

(5) What are the reference hotlines? 

References: 
(a) https://www.digistamp.com/technical/software-alternatives/using-openssl-to-request-timestamps/?q=1
(b) http://www.d-mueller.de/blog/dealing-with-trusted-timestamps-in-php-rfc-3161/
(c) http://stackoverflow.com/questions/11913228/how-can-i-use-rfc3161-trusted-timestamps-to-prove-the-age-of-commits-in-my-git
(d) http://unmitigatedrisk.com/?p=395 | http://unmitigatedrisk.com/?p=397
(e) http://notary.icsi.berkeley.edu/trust-tree/ 
(f)  http://searchsecurity.techtarget.com/definition/digital-signature
(g) http://keyserver.trustedtimestamping.com:11371/
(h) http://breachattack.com/

  More TSAs:
 * GlobalSign (https://www.globalsign.com/timestamp-service/)
 * DigiStamp (https://www.digistamp.com/)

 * DFN-Verein (https://www.dfn.de/en/‎)
 * All CA roots: http://notary.icsi.berkeley.edu/trust-tree/ 

Thanks for your time and don't forget to comment and share this article, add me on G+ and subscribe to the blog's RSS feeds

Sunday, October 27, 2013

Case #2: The alpha-padding permutation cipher

Requirements: PHP 5.4+ and Apache web server - both installed and running
Especially for this article, references will be noted by letters instead of numbers.

For this week, I'll be using a different approach of explaining the algorithm that I refer as "understanding in the storm". This will make use of special references within the codes. References will be made to some comments Also after reading some articles on the permutation cipher, I got inspired and developed an enhanced version of this last. Indeed the traditional permutation cipher was shown to be vulnerable to brute force attacks and frequency analysis.

First of all, let me quickly explain how permutation ciphers work (You can skip it though).

1.The traditional permutation cipher
We choose a keyword, and split the plaintext (the text to encrypt) into blocks that are the same length as the keyword. We write this in columns beneath the keyword( It can be represented as arrays in programming). We then label each keyword letter in alphabetical order (if there are duplicates we take them in order of appearance). Now we reorder the columns, so that the numbers are in order (the letters of the keyword are in alphabetical order). We now read across the rows[a].
For instance, suppose you want to process this plaintext using the key beac: kanel is awesome

The two tables show how it goes using the procedure described above.
   
            b       e       a      c
            2       5      1      3
            k        a        n        e
            l                   i         s
                      a        w       e
            s        o        m       e
  
====>              a       b       c      e
             1       2      3      5
             n         k       e        a
             i           l       s        
             w                 e        a
             m        s       e        o
The original plaintext broken into blocks having the same lenght than the key and sorted in labeled columns The blocks ordered after sorting the key's digits

 Therefore, after encrypting, the result (the ciphertext) will be: nkeails w eamseo. You see now the relation with anagrams. The plaintext and the ciphertext are anagrams in permutation ciphers.
However, this technique has a lot of inconviniences:
(a) The information relating to the length of the key is revealed by the length of the ciphertext
(b) Such permutation is breakable by trying out different permutations until you find a first block that makes sense. Once you have worked out the permutation, you can then use this to decrypt the rest of the intercept easily. That's brute force attack.
(c) Another way to break it is by frequency analysis. Basically, you consider some specific commonly used letters of the permutation's language (english, french,...) that are parts of the ciphertext, then you study their frequency [b] in order to determine which one is more likely in the plaintext and thus, infer the plaintext.

Looking at these, I said to myself: "With such flaws, such cipher is definitely useless". So I started working on a more mature useful permutation-type cipher that I called: the alpha-padding permutation cipher .

2.The alpha-padding permutation cipher

Inspired from the traditional transposition cipher, the essence of this encryption method resides in three(3) main pieces: an alphabet, a padding parameter and the key.

2.1 How it works:
First, you choose a key and a padding parameter in line with the following requirements:
- Both must be numbers
- Both must be contain at least two(2) digits 
Now during the encryption, as its predecessor, the original plaintext is broken into blocks having the same length than the key and sorted in labelled columns. But this time, before the blocks are ordered after sorting the key's digits, each character of the plaintext-side content of the columns is mapped to a number based on an alphabet and its position in the alphabet; and the position is then padded using the padding parameter. A child parameter called subpad, based on the key and the padding parameter, is computed and  debited substracted from the total. The subpad is the cumulative join of the padding parameter's half within the length of the key.

Example:
Consider again the same plaintext above: kanel is awesome but now for the key, we will take 2513 and for the padding parameter, we choose 10 for instance. We define our alphabet as follows:  "00"=>" ","01"=>"a","02"=>"b", 
"03"=>"c","04"=>"d","05"=>"e","06"=>"f", "07"=>"g","08"=>"h","09"=>"i",
 "10"=>"j","11"=>"k","12"=>"l","13"=>"m","14"=>"n","15"=>"o","16"=>"p", 
"17"=>"q","18"=>"r","19"=>"s","20"=>"t","21"=>"u","22"=>"v","23"=>"w",
"24"=>"x","25"=>"y","26"=>"z","27"=>"_");

  2   5   1   3  
  k    a    n    e
  l           i     s
        a    w   e
  s    o    m   e
=>   2   5   1   3    
  11  01  14  05
  12  00  09  19
  00  01  23  05
  19  15  13  05
=>     2   5   1   3    
     21  11  24  15
     22  10  19  29
     10  21  33  15
     29  25  23  15
=>   1   2   3   5
  24  21 15  11
  19  22 29  10
  33  10 15  11
  23  29 15  25
The original plaintext broken into blocks having the same lenght than the key and sorted in labeled columns Mapping each block's characters to the alphabet Padding/Adding a noise to the map by adding 10 to the content of each block The blocks ordered after sorting the key's digits

The resulting ciphertext will hence be: 24211511192229103310151123291525
Finally, the subpad  (5555) is substracted from the resulting ciphertext  and the new ciphertext is: 24211511192229103310151123285970

In general, one can say that the alpha-padding permutation cipher is a vertical padded sort followed by a horizontal substraction.

For the decryption process, we simply first add the subpad and split the ciphertext into block of lenght 2. Next, for each block, we substract the padding paramater from each block's numbers and map the result to their corresponding alphabet's character. Then we form super-block of length = key's length, index each super-block with the sorted key's digit. Finally, we bring the plaintext back by bringing the indexes(key's digits) to the original order of the key. Without knowledge of the key and the padding parameter, it's almost impossible to crack this cipher.

Pretty simple and powerful; but strict though. As a matter of fact, apart from the initial requirements, two more constraints are added: Because the key's digits serve as array indexes, the key shall not contain repetitive digits (e.g: 1000, 1199,....). Furthermore, how good are the ciphered plaintext and deciphered ciphertext depends on how good is your alphabet. So you might want to revise it.

2.2 The code:
Now, let's take a look at the class behind the alpha-padding process:

 class.alphapad.php

<?php

class alphaPad {
  
   private $key = 0;
   private $padding_par = 0;
   private $alphabet = array ("00"=>" ","01"=>"a","02"=>"b","03"=>"c", 
   "04"=>"d","05"=>"e","06"=>"f","07"=>"g","08"=>"h", "09"=>"i", 
   "10"=>"j","11"=>"k","12"=>"l","13"=>"m","14"=>"n","15"=>"o", 
   "16"=>"p","17"=>"q","18"=>"r","19"=>"s","20"=>"t","21"=>"u", 
   "22"=>"v","23"=>"w","24"=>"x","25"=>"y","26"=>"z","27"=>"_");
    private $sub_par = 0;
    private $length = 0;
    private $plaintext = "";
    private $ciphertext = "";
    private $key_a = array();
      
    public function __construct($key, $padding_par){
        $this->key = $key;
        $this->padding_par = $padding_par;
        $this->length = strlen($this->key);
        $this->key_a = str_split($this->key);
        $this->sub_par = $this-> _sp($this->padding_par, $this->length);
    }
  
    public function _encrypt($plaintext){
        try{
            $this->ciphertext = "";
            $this->plaintext = $plaintext;
          print "<div id='out'>Plaintext: ".$this->plaintext." <br>(key: ".$this->key." - padding parameter: ".$this->padding_par.")";
          
            //- split the plaintext into block of lenght = key lenght [1]
            //- index each block using the key [2]
            //- and sort each block by the key's index [3
]
            $plain_a = str_split($this->plaintext, $this->length); /*[1]*/
          
            foreach ($plain_a as $block) {  /*[2]*/
                //- if a block is not a multiple of $lenght. e.g: "plai ntex t" for $lenght=4
                $empty_space=strlen($block)% $this->length;
                if( $empty_space > 0) {for($j=1; $j<=($this->length-$empty_space); $j++) {$block .="-";}}
              
                $block_a = str_split($block);
                $tmp = array(); //- we put all the changes in a temp array first
                for($i = 0; $i < $this->length; $i++){
                    //- $tmp[$this->key_a[$i]] = $block_a[$i]; - For the primitive permutation
               $tmp[$this->key_a[$i]] = (int)array_search($block_a[$i], $this->alphabet)+$this->padding_par; /*[2]*/
                    //- unmatchable index keys will be converted to 0
                    unset($block_a[$i]);  
                }
                $block_a = $tmp; unset($tmp);
                ksort($block_a);  /*[3]*/
                $this->ciphertext .= implode("",$block_a);
            }
            $this->ciphertext = $this->ciphertext - $this->sub_par;
            print "<br>Ciphertext: ".number_format($this->ciphertext, 0, '.', '');
        }
        catch(Exception $e){
            die ("An error occured");
        }
      
    }
  
    private function alpha(&$item, $key, $parameter){
        $item = (int)$item;
        $item = $item - $parameter;
        if(strlen($item) == 1) $item ="0".$item;
        if (isset($this->alphabet[$item])) $item$this->alphabet[$item];
        else $item = "-";            
    }
  
    public function _decrypt($ciphertext){
        try{      
            $this->plaintext = "";
            $this->ciphertext = $ciphertext;
            $this->ciphertext = $this->ciphertext +$this->sub_par; 
            print "<div id='out'>Ciphertext: ".$this->ciphertext." <br>(key: ".$this->key." - padding parameter: ".$this->padding_par.")";

           //- split the ciphertext into block of lenght = string lenght of the items in $alphabet = 2 [1]
            //- for each block, unpad and map to the alphabet character [2]
            //- form super-block of lenght = key lenght [3]
            //- sort the key in ascending order [4]
            //- index each super-block with the sorted key [5]
            //- bring the plaintext back by bringing the indexes(key digits) to the original order of the key. [6]


            $cipher_a = str_split($this->ciphertext, 2); /*[1]*/
          
            if($this->length%count($this->ciphertext)==0) {
                      
                array_walk( $cipher_a , array($this,'alpha'), $this->padding_par); /*[2]*/
                $cipher_a = array_chunk( $cipher_a, $this->length); /*[3]*/
                asort($this->key_a); /*[4]*/
              
                foreach($cipher_a as $block_a){
                    $tmp = array(); $i= 0;
                    foreach($this->key_a as $k=>$v){  /*[5]*/
                        if(isset($block_a[$i])) {
                            $tmp[$v] = $block_a[$i];
                            unset($block_a[$i]);
                        }
                        else $tmp[$v] = "-"; //- e.g: 25 31 21 --
                        $i++;
                    }
                    $block_a = $tmp; unset($tmp);
                    $block_a = $this->referential_ksort($block_a,$this->key); /*[6]*/
                    $this->plaintext .= implode("", $block_a);
                 }
                 print "<br>Plaintext: ".$this->plaintext;
                
             }
             else echo "Invalid ciphertext";
        }
        catch(Exception $e){
            die ("An error occured");
        }
    }
  
    private function referential_ksort($array, $sequence){
        //- sorting by array index and also by following a given sequence
        //- thus all the array keys must be members of the given sequence

        $sequence_a = str_split($sequence);$temp = array();
        foreach ($sequence_a as $s) $temp[$s] = $array[$s];
        unset($array);
        return $temp;
    }
  
    private function _sp($pad,$len){
        $sp = "";
        for($i=0; $i<$len; $i++) $sp.=(int)($pad/2); //- so that the substraction will always be > 0
        return $sp;
    }
}

2.3 Complexities (Time and Space):

Let l be length of the key, n the length of the original input array
We note n/l as the round up of n/l

For the encryption:
#Time complexity: O(l n/l) 
In the method _encrypt(), during the "for" loop [2], n/l blocks (from the initial block division) are traversed. Then in each loop, we have another loop of l rounds giving a total of l n/l time units. The computation of the $sub_pad requires l time units because of the "for" loop involved in the process of the method _sp(). However it's not included in the time complexity since it's computed at the same time the class is instantiated.
#Space complexity: O(l+n)
Again in the method _encrypt(), the computation of $key_a requires l space units, all the $block_a/$plain_a instances after splitting the plaintext takes n space units, the transition from $block_a to $tmp  and the other fixed variables  take only one (1) unit space.

For the decryption:
#Time complexity: O(n/l+(2l)n/l) 
In the quest of applying the method alpha(), the method _decrypt() will take a single time unit to traverse each of the n/l blocks. Similarly to _encrypt, the super 'for' loop will traverse each of the n/l block via another loop of l rounds; but this time, we have a game changer called referential_ksort() in each the l-rounds loop causing a total consumption of 2l n/l time units for this 2-dimensional traversal. 
#Space complexity: O(2l+n)
As in encrypt(), $key_a costs l space units, all the $block_a/$cipher_a take n space units. However the variable $sequence_a in referential_ksort in the 2-dimensional loop will cost an additional l. And we have the transition from $block_a to $tmp and the other fixed variable demanding a single space unit.
 
2.4 The results:
The test of its code was done in a regular Apache-PHP environment, nothing special) and shows encouraging results

 View the results|Download source

Memory overhead: Given a specific cipher, we define the memory overhead as follow: 
overhead = (size_of_plaintext / size_of_ciphertext)-1
In the experiment, the following function was in charge of capturing the memory footprint during the ciphering:

function sizeofvar($var) {
   $start_memory = memory_get_usage();
   $tmp = unserialize(serialize($var));
   return memory_get_usage() - $start_memory;
}

Note that this is in no way a reliable method, you can't be sure that nothing else touches memory while assigning the variable, so this should only be used as an approximation. The figure bellow shows the results:

   Loading ...   

Execution time: To efficiently test the running time of the cipher over plaintexts of random length,  we measured 5 times the running time over 5 different values of the padding parameter (10,47,84,135,206,313) and over 5 different values of the key (12,98,105,376,1010). For each round, we increased the length of the input plaintext. The results are plotted in figure below.

   Loading ... 

False positives: By false positives, we mean the scenario where the encryption always works but decryption not. Indeed, during the experimental phase, we found that false positives occur when the key's length is 1 (one) and the value of the padding parameter is 79 and above. That's why we suggest again to use keys containing at least 2 (two) digits.

2.5 Some practices in real-world:

#Name-based URLs: The detailed technique in this article can be used to match a username to a specific code. For instance, in a social network or any other dynamic computer system where user's data are stored in a XML file named using a user-based code (such code can be created by passing the user name to our alpha-padding permutation cipher, you might have a scenario where the user "johndoe" request to see the data of  "smith" via the URL: http://www.example.com/smith. This request will load the file 1929302324181210-data.xml (1929302324181210 is the ciphertext for "smith" for the key 2513 and the padding parameter 10).

#Digital signatures: Another practical example of our cipher is found in the identity management area. It can be used to protect files using a collaboration with trusted time-stamping. This scenario will be the topic of our next article. The result of such collaboration will be called Private Trusted Identity. In this case,  a document is created by someone called "John Doe Sr.". After signing the document with something he is (his name/social security number/fingerprint/...), the saving procedure will used his name to create a code using our alpha-padding permutation cipher. This code will be then joined to a time-stamp provided by a time-stamp authority to form a private key or digital signature. The document, the private key and the time-stamp will be rowed in a database. During the access to the document by another bud, the digital signature will be verified first before the green light is given.

Extended work:
A deeper research on the alpha-padding permutation cipher includes, but is not limited to:
- The sub-pad parameter (the variable $sub_pad variable in the code): The resulting ciphertext can be really large for some plaintext and therefore might not be supported by some PHP environment. In our presented study, the sub-pad parameter cannot not be longer than the key. Thus, future works will aim to provide a larger sub-pad parameter in order to considerably reduce the length of the ciphertext and hence the inference power of the attacker.
-  Another vista to pay attention is the customization of the alphabet. A more diverse and mature alphabet will give a bigger input field from the user's perspective. However such tailoring will require a conscious  adjustment of the cipher.

References:
(a) http://crypto.interactive-maths.com/permutation-cipher.html
(b) http://homepages.math.uic.edu/~leon/mcs425-s08/handouts/breaking_tranposition_cipher.pdf
(c) http://web.cs.du.edu/~ramki/courses/security/2011Winter/notes/Lecture4.pdf
(d) http://everything2.com/title/Permutation+cipher
(e) http://stackoverflow.com/questions/2192657/how-to-determine-the-memory-footprint-size-of-a-variable 

Thanks for your time and don't forget to comment and share this article, add me on G+ and subscribe to the blog's RSS feeds