Seal | Verification error handling

I would like to bring a discussion originally raised in our gitlab.

Original ticket has been created for the Web Verification App, but this topic also applicable to Seal add-ons and Extensions.

Generally, verification flow consists of three steps:

  1. Reading QR code data and obtaining status id
  2. Reading array of statuses for specified status id
  3. Statuses verfification
  4. Verification of the attachments

What possibly could go wrong during the whole verification routine

Reading QR code data

As we know, HEAD part of the QR code data being passed as base64 encoded protobuffer message using ?q= argument of the query string. (e.g: https://office.app.vereign.com/?q=CiDENc3Vva99Emghtc7xA4YkspjDlR7feoTqgQSuRIiT6RIgrhgyERdaNCyFOsUjkZJeXfY7K2jQ8tIvuRlSsqZZgJk=&timestamp=1610541100482)

  • First error might happen during the protobuffer decoding of the ?q= base64 string in case q is invalid protobuffer message.
  • Next step is fetching of the encrypted TAIL part from CDN. Tail might not be available due to async nature of uploading data to the CDN. We should show “pending” state.
  • Final step is decryption of the assembled QR code data. Decryption might fail in case one of the keys broken somehow.

Once the whole routine is done, application obtains the next structure:

{
  statusId,
  sender,
  subject,
  date,
  recipients,
  attachments,
  senderPublicKeyUuid;
}

Reading array of statuses

  • First of all, statuses might be pending in backend queue and not available yet.
  • Once statuses are fetched, application reads public key of the sender from CDN. Public key might not be available yet due to async nature of uploading data to CDN.
  • After that flow verifies status of the sender using the public key retrieved in previous step. This verification might fail as well in case something is wrong with either signature or public key. This would mean that status might not belong to the original sender.
  • There’s also a possibility that more than one status of the sender present in the array due to a bug in browser extension/add-on. Currently this case is handled as verification error.
  • The last step is selection of recipients statuses by taking everything in the array except for status of the sender. Due to possible bug in extension, status of the recipient might be duplicated as well, but landing page at the moment has no way to detect that. It’s just going to show something like “Receipt record exists for 4 recipients of 2”

Verification of the statuses

For every status in the array:

  • Application checks whether blockchain transaction hash exists. If no, further verification of the status does not make sense yet, and status considered as pending.
  • Application reads transaction providing transaction hash. A transaction might not be available due to various reasons. e.g. consequences of the attack 51%, or blockchain node malfunctioning.
  • Next step is retrieval of the name of file with merkle tree from transacion. It happens by calling publicly hosted aeternity compiler.
  • After that app retrieves merkle tree nodes from CDN. I assume there’s a chance that merkle tree data might not be uploaded yet as well.
  • Next step is verification of the merkle tree.
  • Last step is retrieval of the block information which might fail in case the block has been reverted.

Verification of the attachments

Verification happens by ensuring that every signature presented in QR code data is also exists in Sender status data. I am not sure whether this step is correct at all, but I don’t see the other way, assuming that we don’t have access to the contents of the attachments.

For most of the steps in this flow a simple Network error might happen for various reasons. E.g. due to interruption of the internet connection.

Related topic: Check message status using Vereign Seal

@georg.greve please let me know whether I chose not correct place for this topic.

I think you already categorized pretty good the various errors, which can happen in the system. But let me list the categories here as well:

Reading QR code data

Reading of statuses

Verification of statuses

Verification of the attachments

Verification of email content and metadata

At the moment we have kind of a error handling for Verification of email content and metadata and I think it would be a good idea to use the upper categories as a category of the error and to use the different causes as error messages.

Let me give you an example:

Category: Verification of statuses failed
Cause: Blockchain transaction hash does not exist

Dear @markin.io - thank you for this! For me, the category is OK, although there might have been other choices that would also have been OK. In general it is far more important THAT you wrote this. :heart_eyes: :pray:

As to the different categories I think they make a lot of sense and make things a lot clearer.

But please also keep in mind that the system might also be attacked. Not everything will be a delay or a temporary malfunction, e.g.

May also encounter issues because the QR code is trying to lead users to a fake landing page (verification app URL mismatch), because the QR code is just garbage or does not have a TAIL because it is just a simple counterfeit. Or perhaps that QR code is so old that the TAIL might no longer exist. Whichever the reason: TAIL might never show up in these cases.

Obviously the former case cannot be detected by the web verification app, but it could be detected in the add-ins, and then treated with utmost caution - in fact we should even consider automated alerting in a central repository for all add-ins and users which would provide “live threat map” of actors trying to attack the system. Although I think we should do a separate feature story around this.

But if you wait for TAIL in the latter case you might be waiting forever.

So we should ideally find a way to understand whether this is indeed a timing issue, or whether TAIL will never show up. Criteria that might be usable for this:

  • Age of Seal (and correspondingly, TAIL)
  • Time elapsed waiting

At some point we probably want to conclude there won’t be a TAIL and make sure the user is shown some information to that extent. We probably want to determine how to guide the user based on which likely root cause.

Dual sender statuses are interesting. We need to decide which one to treat as authoritative in such cases. But dual receiver statuses are likely common because people might use different clients with installed add-ins, and they might use them on top of gateways that provide receipt records.

Question: How robust is our verification that there isn’t already such a status for a given recipient, or might we even see one status per reading?

The landing page can only show statistical information right now, and perhaps we want to reduce the amount of information shown even further.

For the add-ins we can decode the information so we can associate receipt records to recipients, in which case we might want to consider whether and how to render that information.

The blockchain service should make sure that the entry will land on the chain, and that the block hasn’t been reverted. If this is not temporary then something really has gone wrong in a way and we need to make sure to handle it accordingly. Otherwise we would likely want periodic checks and automatically update.

We probably want to check for two things:

  • Is every attachment we see in the Seal also in the message we are verifying?
  • Is every attachment we see in the message also in Seal?

Because compromise might happen in one of three ways: By modifying something that is there, by adding something that wasn’t there, or by removing something that is supposed to be there.

None of these cases is benign. The email standard does not allow for any of these cases.

So whenever any of this is detected, we must raise the alarm.

Network issues we should detect, if possible, show temporary failure, and retry or allow the user to cancel. Is there a way we can do that sensibly?

Btw, do you both think we can put the current errors into a table with two columns (category and cause) and after that to add additional categories and causes covering all additional use cases provided by us?

We can use some kind of machine learning algorithm here for categorization. There are really good algorithms solving such problems already.

To be honest, this sound more like a bug. If I remember properly Vereign Seal is designed this way that for every account you have one public/private key pair used for signing and verification. @perkon correct me if I am wrong, but I remember that was the case. And it seems like the caching and retry strategy for statuses is not implemented.

Btw, after agreement of what has to be done with the dual statuses and their cashing logic, I think we should put this into the public documentation.

That is correct, for every account there is only one public/private key.

@perkon Do you have any other good ideas how to make the error handling better?

I agree on having categories and creating a reference table with the possible errors.

And then we should make sure that the source code handles those errors and display the exact messages from the reference table.

I don’t remember if this is discussed, but I think that the errors should also have a severity.

The question is how to express the severity visually to the user. I don’t have an advice about the wording, but I think that we should use pictograms and colors that clearly express the severity.

The users should be able to make a fast judgement about the verification state.

That’s a wonderful suggestion. Can we reuse the severity levels for the some of the modern logging libraries? I guess they are explicit enough. What is your opinion on this?

If I understand correctly we have 3 states:

  1. Everything is fine and verified, so there are no errors and all over the place we show green color
  2. Something is wrong, but due the asynchronous nature of the verification we cannot say for sure - these are warnings with orange color.
  3. Something took too much time without result, or some verification definitely failed - these are errors with red color.

So the messages kind can be success, warning or error.

@markin.io correct me if I am wrong

So we should ideally find a way to understand whether this is indeed a timing issue, or whether TAIL will never show up. Criteria that might be usable for this:

  • Age of Seal (and correspondingly, TAIL)
  • Time elapsed waiting

There’s a way to calculate age. We attach timestamp alongside with the HEAD.

Dual sender statuses are interesting. We need to decide which one to treat as authoritative in such cases.

If multiple sender statuses could be verified using the same public key, we can pick earliest one. If signature of one of the statuses could not be verified, shall we consider the whole chain as possibly compromised?

Question : How robust is our verification that there isn’t already such a status for a given recipient, or might we even see one status per reading?

As I said, Web Verification App does not seem to have a way to detect that. And Addons do. So for the Web Verification App instead of showing n of m recipients verified we can say that either all recipients has been verified successfully, or some of them was not (without exact numbers).

The blockchain service should make sure that the entry will land on the chain, and that the block hasn’t been reverted.

It makes sure that transaction is landed on the chain. If all of the sudden it’s been reverted we might have two types of errors:

We probably want to check for two things:

  • Is every attachment we see in the Seal also in the message we are verifying?
  • Is every attachment we see in the message also in Seal?

I think this is how things currently work within Add-ons. There’s no way to check that in Web Verification App though, because it does not have access to the actual files.

Network issues we should detect, if possible, show temporary failure, and retry or allow the user to cancel. Is there a way we can do that sensibly?

Yes, currently they being indicated as “Something went wrong, please try again later”.

1 Like

Verification states could be:

  • Verified
  • Not verified (compromised, malicious and so on)
  • Error (not relevant to the verification itself. Like network error.)
  • Verification Pending (some data is waiting to become available)
  • And one more state would be Data is obsolete. For the cases where data is not available anymore due to various reasons. (Third party service is died, data got wiped out)

I assume that these states can be treated as severity levels.

We can distinguish between pending and obsolete states by measuring the time passed since the seal has been created.

1 Like