Adding decentralised storage feature to Vereign SEAL

As also raised in our internal primers on W3C Decentralised Identifiers and the DIF Sidetree protocol, one goal for Vereign SEAL will be to move toward a scaleable approach of self-hosted, corporate and service nodes as part of switching to what we internally have started calling “Mailtree protocol” as its design and function is very much influenced by the Sidetree protocol itself.

One of the core components of this kind of approach is the Interplanetary File System (IPFS) to store data in an immutable, decentralised and fault tolerant way. Switching to Mailtree will require all our clients – most importantly the Add-Ins – to connect to IPFS.

For the user experience, IPFS will play a major role especially in terms of speed and convenience.

So we need to get the integration of IPFS right and should dedicate a whole product cycle on this topic to make sure we understand all the implications, can measure the different performance impacts, and can make adjustments or develop best practices before the whole functioning of the entire system depends on this component.

Feature Idea Outline

To the layperson, Vereign SEAL – a.k.a. “verifiable credentials for email”, “digital provenance for email and attachments” – is effectively a better version of registered mail. Digital, decentralised, peer-to-peer, more convenient, cheaper, more efficient and with far higher value in terms of securing evidence.

That is why our marketing will highlight the idea of “Registered Mail 2.0” “Digital Registered Mail” and “Decentralised Registered E-Mail” as themes in order to help wider audiences understand what Vereign SEAL provides. Why do people send registered mail? Most often because they want proof that they provided, sent, did something.

Traditional registered mail is effectively only proving that someone sent an envelope. Vereign SEAL can prove WHAT was sent, including attachments. We can prove this by virtue of hashes which are part of the verifiable credential. But this approach requires that users provide the mail or file itself when trying to prove what was sent, exactly. Verification can either be done in the add-in, or requires manual generation and comparison of hashes.

That is not very convenient and may regularly prove too hard to follow for legal professionals that may be involved in judging whether proof has been provided.

Now imagine that the EMAIL ITSELF, as well as all its ATTACHMENTS were stored encrypted in IPFS.

As a result, the web verification app can display the email message that was sent, and provide the attachments for direct download. Because of the properties of IPFS and because of the way in which the verifiable credential itself is formed and secured against the blockchain, both the mail and its attachments would be GUARANTEED to be EXACTLY as they were sent.

In other words, someone trying to prove this is the contract they signed could just share the verifiable credential with the court and tell them: “Here is what I agreed to. Feel free to download the original contract directly from the receipt.” and it is GUARANTEED to be correct and identical, and extremely easy to use.

Because IPFS is content addressable storage that only distributes files on demand, we can do this in a way that is compliant, is not easily data mined, and will work in a globally decentralised fashion.

And not only would this be a feature that would add a lot of value to Vereign SEAL immediately, it would also allow us to build practical experience with IPFS, including its performance and how we can ensure that the overall user experience is good.

Considerations for integration

Because speed is of utmost importance, adding IPFS means we should add IPFS locally whenever possible. Doing so will make storage of data while sending a LOCAL operation, allowing mails to be sent faster, allowing the clients to proceed with sending more quickly.

Note: For cloud based services, the local device may be further away. So there it might be better to write to an IPFS instance run by Vereign in the corresponding cloud infrastructure - making it as local as possible. So each client will need to take its data flow patterns into account.

Some clients are therefore likely to need to support more than one approach, e.g. for Outlook on the desktop vs Microsoft 365. They should therefore have an order of preference and priority to use (highest priority / preference first) for SENDING:

  • Local IPFS node in the same cloud, if cloud based – OR –
  • Local IPFS node configured by administrator (e.g. for companies self-hosting)
  • Browser based IPFS node, where possible and the browser offers it, e.g. IPFS Support in Brave | Brave
  • Add-In IPFS Node via JS-IPFS
  • Last fallback: Fallback IPFS Node operated by Vereign itself

For READING/VERIFYING we can start with the same list, but this is the case that is more likely to be slow, and we may need to play with this and tweak things to work as intended. So clients MAY in fact find themselves with a different approach / list for VERIFYING.

In any case, ALL clients – including the web verification app – should include JS-IPFS by default.

Other Changes

Other places we need to introduce changes for this feature

Configuration

Storing the email and/or attachments into IPFS should be optional.

So we may need configuration of default behaviour, or a convenient way to toggle behaviour.

We will likely also need to allow configuration of preferred IPFS node to use, with sane default.

Sending

Sending with storage into IPFS means we need to

  • generate symmetric encryption key (for AES-GCM, most likely)
  • encrypt message / attachments with key
  • store message / attachments into IPFS
  • store encryption key into verifiable credential / SEAL
  • store URIs of message body and attachments into verifiable credential / SEAL along with the file hashes we currently store
  • send SEAL normally

SEAL / Verifiable Credential Data

The data schema for SEAL verifiable credentials therefore needs to be extended to support

  • AES-GCM private key for content encryption
  • IPFS URI for message in IPFS
  • IPFS URIs for attachments in IPFS

Web Verification App

The web verification app needs to see whether message body and / or attachments are available, get the key from the SEAL, retrieve the attachments and message, decrypt them, and

  • display the message, if available
  • offer the decrypted attachments for download, if available

Also, this process should be as non-blocking as we can make it.

Your turn

I hope this explains the rationale and intended behaviour well enough to allow everyone to think it through and provide insights as to what might have been overlooked, as well as suggestions about how to implement, exactly, and how to split up the work.

giphy

Highly relevant and interesting post from other thread:

https://community.vereign.com/t/token-idea-personal-professional-email-token-pet/317/6

We should consider building our own IPFS storage API based on

so we can re-use all of

including things like

Hello! This is a follow-up summary of the technical meeting we had yesterday with @georg.greve, @zdravko, @alexey.lunin and me.

Please keep in mind that my thoughts and opinions are constraint by my understanding of the IPFS network and the SEAL project itself, and there are still many many things which I don’t understand well. Please correct me and expand on this however you see fit.

IPFS

The architecture of IPFS will lead and affect our own architecture for storing and fetching data. We’ve considered the following points.

1. Writing data to IPFS

Writing data to public IPFS gateways is NOT reliable and NOT recommended.

This is understandable, because if a gateway allows everyone without any authentication to upload content, the gateway itself will be overloaded with issues - active and passive monitoring for abusive content, storage and bandwidth costs, DOS attacks, etc.
All public gateways are used for fetching IPFS content and not for uploading new content. Even if a gateway seems to allow content uploading (as of now), there are no guarantees for the reliability or availability of the service. It may stop to accept uploads whenever the owner or rate limiting algorithms decide to.

This means that we must have our private internal IPFS cluster of nodes for writing data. These nodes won’t be exposed externally and will be accessible from our backend services only.

Extensions (clients) will send the encrypted attachments to a HTTP backend service and it will handle the upload in IPFS. The service will require authentication with a Vereign token, so only logged-in users will be allowed to upload data.

Here I see the following challenge: Clients must include the identifiers of the uploaded attachments in the SEAL (hash or CID of the IPFS content), but we don’t want the email client (and the user) to wait for the upload to finish, so that the email can be sent. I’ll be reading the docs to see if we can calculate the CID before the uploading takes place, so the backend service can respond with the CID immediately to the client or even better - if the clients themselves can calculate the CID on the data before sending them to the backend, the UX will be best. This issues comes from the fact that IPFS content is not addressable by a filename that we can generate or specify (as is the case with traditional storage), but instead it addresses the content by itself.

2. Getting data from IPFS

We discussed different options and tradeoffs and how we can make the experience more optimal.

One option is for clients to always fetch data from our own IPFS cluster. This should have good performance, but is missing the point of decentralized network usage.

@georg.greve suggested some other options and I’ll try to summarize them.

  • We can somehow notify public IPFS gateways like Cloudflare which content is available at our IPFS cluster, so that when they receive requests from clients, they need not search the entire IPFS network for the content, but immediately fetch it from us. This is a great option which we’ll have to research.
  • Clients can have built-in IPFS javascript nodes which can also be preconfigured to look for our content in our IPFS cluster directly. This will probably be a later step as it will require more development on the frontend than we’ll need initially, but it also sounds good for faster access.
  • Clients can upload the files directly to their IPFS javascript node - this sounds good, but I’m not sure how reliable it is. What happens if the user close his tab/browser after he send an email and the uploaded content is not yet synced with the IPFS network and our IPFS cluster? Will the IPFS module still work under the hood with the tab closed? And with the browser itself closed, it will probably not work for sure?
  • Another good option he suggested is a feature of the IPFS network which can be used to trigger a caching event on a public gateway for a particular content (CID). This will be very useful as most emails are opened by recipients relatively soon after they’ve been sent (e.g. 1-5 days), so this caching may speed up the fetching of data significantly.

Operational Challenges

  • We’ll still need to store all of the data ourselves forever (or 10 years), because we cannot force the network to store it for us. This means we’ll either need to pay for conventional storage and/or use 3rd party services that will store the data for us under an agreement, so that we can be sure that nothing disappears.

  • We’ll need to have scheduled backups of the data (as is with conventional storage).

  • We’ll need to administrate and operate a (secure) IPFS cluster with its own storage (sysadmin, devops work).

Please comment if I missed or incorrectly described something. The input from the frontend team will also be very helpful as a lot of work will happen there, especially if want to implement IPFS nodes in the client extensions.

1 Like

Thank you for the summary, @luben !

That is why I believe all add-ins should include IPFS nodes by default, see “Considerations for integration” in the original post.

That way writing is instantaneous, because it is local, and the CID is available right away for sending.

The data so written to IPFS can then be synchronised to the network asynchronously as the mail itself is getting wrapped up and sent. FWIW, we can upload to IPFS the moment something gets attached, so likely several seconds, perhaps even minutes, before something is sent.

In any case, sending a mail should always trigger a synchronisation with pinning to our own IPFS cluster, which can proceed asynchronously in the background, as described.

Note: There will be a short window of potential data loss, basically a race condition of “user sends mail and then immediately uninstalls add-in including the local IPFS node before the data could be synchronised” - but we may be able to mitigate this condition in a couple of ways, plus it does not seem like a very likely path of action for a normal user.

Be that as it may: There is a strong incentive for us to always keep things synchronised, and thus trigger synchronisation as quickly as possible whenever sending mail – if only to make sure the gateways along the path and the recipient have the required data available to process the sealed message.

This would seem to spell the following technical steps:

  • Add IPFS nodes to our add-ins, which translates to adding GitHub - ipfs/js-ipfs: IPFS implementation in JavaScript to our add-ins
  • Create IPFS cluster for Vereign that we can synchronise data to
  • Build API that allows us to
    • Trigger SYNC & PIN (with time parameter)
    • Write data to our IPFS cluster & PIN (with time parameter)

That API must require authentication, and as written above, I would propose to re-use or re-build

https://nft.storage/api-docs/

for this API, extending it for the “SYNC & PIN” operation.

FWIW, we also want to add payment to this operation, as part of our work on https://community.vereign.com/t/technical-questions-in-preparation-of-the-token-sale/314/2 but that can likely happen in the second step.

In the first step I think it is crucial we build this out as an attachment/body feature first on top of our existing product, allowing us to get practical experience with all the implications and pitfalls. All this work will then be useful for our work around the token sale, as well as switching to full Mailtree mode.


Besides leveraging things like Filecoin in ways similar to

and others, we may also continue to use Backblaze for this through a combination of

and

which would be the smallest possible change, and would allow us to benefit from the extremely advantageous Backblaze storage costs.

My preference would be to go this path at first, as it would allow us to provide this service for the time being similar to what Protocol Labs does with

1 Like

Using Backblaze should actually eliminate or at least dramatically de-prioritise that requirement for the moment…

1 Like

Hello, I’m writing a summary of what we’ve done so far for the IPFS storage feature.

Design

The following picture presents a high-level overview of the IPFS architecture.

IPFS Service

It is a Go backend service that is processing all client requests. It exposes a thin API layer above the IPFS API and clients communicate directly with this service only when uploading content. The API of the IPFS cluster is not visible from the internet and can only be accessed by our internal backend services.

This front-facing IPFS service enables us to:

  • authenticate requests
  • implement business logic, like for example automatically announcing the content that has been uploaded
  • provide more sensible log entries and responses related to our business logic
  • validate and rate limit requests
  • attach metrics and tracing to IPFS operations
  • decouple and hide all implementation details of the IPFS API and its future changes

What we have so far

The work can be grouped by functionalities on the frontend (clients) and backend (ipfs service + ipfs cluster).

I can describe the progress of the backend functionalities and @alexey.lunin might describe what is happening on the frontend.

On the backend we have a local dev environment with docker-compose which contains:

  • a container running IPFS node
  • a container running IPFS service with an exposed HTTP API

The implemented functionalities so far are:

  • request authentication
  • content upload
  • content download
  • announce content to the IPFS network which should help with query performance

The last point is not tested in real conditions, because announcements work when our IPFS node(s) are partly visible on the internet, which cannot be done with a local dev environment. @zdravko put a lot of efforts to deploy the backend parts on k8s, but we still don’t know how to expose the 4001 port of the IPFS nodes, so that they can participate in the content routing and announcements of the global IPFS network. He still has some ideas that will try, but this remains a WIP for now.

What remains to be done

  • Secure and properly configured IPFS cluster for dev/staging/prod environments
  • Configuration of storage mechanism for the IPFS cluster (e.g. backblaze or mounted storage in the Cloud)
  • Deployments for the IPFS service on dev/staging/prod environment
  • Rate limiting, metrics and tracing in case we decide to go in production
  • Frontend development for encrypting and storing attachments in IPFS
  • Frontend development for fetching and decrypting attachments from IPFS

Unfortunately the last point is very uncertain and probably from now on it’s best to focus on it, because if it turns out that announcements and content discovery doesn’t work well, I guess it may change our plans to use IPFS as a whole. I mean, if a user has to wait 30-60-120+ seconds to see a web page, then probably this feature won’t make much sense. To test this we need to deploy an IPFS node in the Cloud and open its 4001 port to the internet, and try various node configuration options + announcements.

To wrap up: we have an API to upload, download and announce content. We need to further configure a cluster and do the frontend part.

Please comment if you have questions and suggestions.

@georg.greve @kalincanov

Hi @luben - thank you for this, this is great progress!

Some questions and remarks from my side:

PIN as part of API

We all know that performance is going to be critical. Which is why the initial outline assumed we would write data locally on the device sealing the message. Of course right now our add-ins do not have IPFS yet, but the way this SHOULD work is to write locally, and then authenticate toward our service to request to PIN (= synchronise, provide and make permanent) this information.

So I would expect our API to also have operations for

  • PIN
  • UNPIN

where PIN should likely have a time component to it, e.g. “PIN for 10 years” which should trigger the fetching of that information and subsequent pinning. IPFS already has ready-made components for this, in fact there are commercial and free pinning services available right now. See

and

for more information.

Since it is a resource costly operation, all these pinning services typically require access keys, which they then use to map requests to accounts, which have built-in accounting based on volume.

Thought on PINNING

We might even consider using more than one pinning service for redundancy or allow people to select their own pinning service preferences as additional features.

Performance of LOCAL + PIN vs UPLOAD and PIN

LOCAL + PIN makes sense where files are on a local device that has “imperfect” bandwidth because otherwise sending would involve waiting for all the uploads to be finished. In these situations, LOCAL + PIN allows us to send right away, and “lazy sync” after the mail has been sent.

But where data is already in the cloud because it has been uploaded during drafting stage, or because it was attached as a link to cloud data, pulling it down to then write it locally only to then synchronise it back to IPFS again makes no sense.

So where such data is already in the cloud, we should use the API to transfer “cloud to cloud” in order to not depend on poor local bandwidth.

API spec might need “PIN time” argument

Like for the “PIN” operation, the “UPLOAD” operation likely also needs a “and pin” or a “and pin for time period X” argument.

Peering for Performance

If we want to host our own IPFS service, or use Cloudflare (or similar) for this function, we want data to be available as quickly as possible.

IPFS has a notion of peering, which basically means a constant connection is kept between the local node and another node that we know has data we are interested in. This is a configuration item, see

All our clients should keep permanent peering with all the IPFS nodes we know to hold SEAl data, I believe, be it our own, and/or one we operate via Cloudflare.

Performance: Look local + Download

When accessing data, we should always have parallel requests to get the data we are looking for locally, as well as downloading it from the “IPFS cloud service.”

If local comes back right away, we can already use that data and can abort the download operation.

If local does not have it, we need to wait for the download to finish.

But the request to local will likely also trigger a sync to local so we can re-use the data on when needing it again, which is not uncommon: If you’ve looked at this mail today, chances are you will have another look in the next 7 days or so. By parallel request with a “first winner” approach we can use IPFS as a dynamic cache for data we are likely to need again.


So far from my side. I hope all of this makes sense.

If you have questions, you know where to find me. :smiley:

1 Like

By the way: I tried this out with the IPFS node in my Brave browser, peering it against Cloudflare as provided in the example.

Uploaded a PDF, got its share link (https://ipfs.io/ipfs/XXX) and then accessed it via https://cloudflare-ipfs.com/ipfs/XXX. The PDF was new, freshly uploaded, but my other browser (Google Chrome) pulled its data in a second or two. It wasn’t noticeably slower than normal web pages.

Promising.

Hi Georg!

I don’t want to sound contrarian or negative and I believe that the discussions and questions that you’re raising are immensely valuable for all of us to understand how to implement and think about these features and technologies.

I just want to share what’s bothering me with the IPFS implementation and my current understanding of the IPFS network and its capabilities. The writings that will follow may be partially (or completely) wrong, but that’s why we discuss and learn things :slight_smile:

I’ll try to write answers by quoting different parts of your post.

Our current API has pinning enabled (forgot to mention it yesterday) and when content is uploaded to the IPFS service, it’s also pinned to our IPFS node (currently pinning is without expiration). As of now the API doesn’t have endpoint to UNPIN content, but it will be very easy to add it whenever we need.

As far as I understand you can only pin content to nodes where you’re effectively uploading the content. So pinning goes hand-in-hand with uploading data. We cannot just pin data to a node without uploading the data to that node. So in order for anyone to pin content to a 3rd party service, it means he’ll have to somehow authenticate and be authorized to upload, which is a paid-for service. After the client/user/business has entered into an agreement with a 3rd party to upload and store content there, then content upload happens in the same way as we’re now uploading content to our IPFS nodes.

We can re-upload and pin content from our IPFS nodes to a 3rd party node for redundancy, but we’ll have to pay their storage price. As far as I see it, this can only be achieved service-2-service. I don’t see how browser nodes can authenticate against a 3rd party securely and upload data there directly, because if authentication keys are in the browser, they are effectively not confidential. So pinning and uploading from a browser extension to a 3rd party node can only happen by proxying the data through some backend services - which means that we can upload the data both to our IPFS nodes and 3rd party nodes (which effectively means that the browser only uploads data to our IPFS service).

It would be good if we can still make the uploading of content to our IPFS nodes async and send the email quickly, without using a local IPFS node, because for the moment I can’t see the benefits of having a local IPFS node in the extensions or the browser, because effectively this node is not reachable from the internet. From what I understand, a local node cannot be a peer to the other nodes of the IPFS network, because they cannot initiate a request back to it when they want to fetch content: javascript - Listen to http request on webpage - Stack Overflow

I mentioned yesterday, that if an IPFS node wants to advertise to the network that it has some content, it has to be reachable from the internet. Nodes that receive the announcement request will try to open a connection back to the advertising node and if they can’t connect, they will not create a routing record for that content (the announcement will be ignored). I assume this is what will happen with the IPFS browser nodes - the content that they have will be useless and unreachable. I may be wrong about that, but this is what I’ve found so far.

This seems to work on a good will basis, because the fact that you configured your node to open long-lived TCP connections to other nodes and public IPFS gateways, doesn’t mean that they will honor your requests. They will frequently recycle/refuse/drop connections in order to operate their service more efficiently and we cannot guarantee that we’ll have a stable connection with these providers. We can configure our nodes to try to make these connections, but it’s up to the other party to accept and support a long-lived connection. Even though these configuration options may be useful for more efficient content discovery and routing.

Here I’m a little bit lost on the meaning of “locally”. I imagine the following scenarios:

  1. A recipient of the email has direct access to the attachments in the email, so he doesn’t need to download them - so here we have nothing to do with IPFS.

  2. The web verification app must download the attachments from IPFS.

If we assume that the user’s browser that opens the verfication app has a built-in local IPFS node, I can’t see how the attachments will ever be there in this local node. It seems to me that looking for the data locally will never yield results. The web app will always have to fetch the data from a public IPFS service or our own IPFS service.

Next the web app can start concurrent requests to public IPFS services and our IPFS service, but this also doesn’t seem to make sense, because our IPFS service will respond immediately with the content, while public gateways will have to find the route to our node, and fetch the data from there and restream it. I suppose it will always be orders of magnitute faster to get the data directly from our service, because our service is representing the only node(s) on the IPFS which has the content. Even in cases where the public gateway has a direct record and knows that it must fetch the data from us instantly, this will still be an operation that is placed in a queue and effectively restreaming the content from us. So I think, we streaming the data directly to the client will always be faster than, we streaming the data to another node which will then stream it to the client.

To wrap up: The IPFS network as I currently see it doesn’t have any incentives for anyone to store anyone else’s content. And in general nobody stores the content of nobody else. It so happens that if multiple parties/businesses/people store the same information, like for example a huge public dataset, or the internet archive, or some other valuable public information, then this distributed information is redundantly dispersed and can be retrieved and exchanged between clients and servers more efficiently. This as I understand is the purpose and strength of IPFS - multiple parties without any coordination hosting the same content, makes the access to the content more efficient and at the same each of them has a copy, which increases the redundancy.

However, for specific information of a company, without any structure because everything is encrypted, no one in the network will store even a single byte of our content, unless we pay them to and even when we pay them to, our clients will still have to go through our backend services for authentication.

We can use a local in-browser node to experiment with lazy syncing from a client to our nodes, but besides that use case, for the moment I cannot see what else we can do with a browser node. And if this turns out to be the case, it will be best if we can async upload the data without using a local IPFS node.

Please excuse me for the cold shower thoughts on this topic, but it’s how I currently see the IPFS stuff. If I’m wrong, I’d happily change and evolve my understanding :zipper_mouth_face:

Hi @luben - no worries.

Discussing these things to make sure we’re all on the same page is the reason for this forum.

Excellent.

See the links I shared yesterday.

From what I understand, those pinning services work via a RESTful API over which you submit the CID/hash of what you want them to PIN. They then request that data, and pin it for you, accounting the storage required to your account with them via the authentication token you need to submit such a request.

The node needs to have the data to pin it, yes. But it can easily get the data by requesting it, following by a PIN operation on that data. In fact you can find just that information in the documentation shared yesterday:

https://ipfs.github.io/pinning-services-api-spec/#section/The-pin-lifecycle/Checking-status-of-in-progress-pinning

Yes, if we ourselves wanted to also use a 3rd party pinning service, we would have to pay for that. In that case we’d have to price this in. But of course our USERS might choose to use additional / 3rd party pinning services for which they would be happy to pay themselves and it would do nothing to impact the functioning of our system.

That is evidently false, since this would otherwise not have worked:

My browser is behind two firewalls, has no open ports, nor a public IP address, yet it works perfectly serving data up to the IPFS network.

That is a StackOverflow post by someone from 2016 about HTTP in general and does not seem related to IPFS, at all. I fail to see how this would be relevant to what we’re discussing?

IPFS uses peering with publicly nodes for its distribution method, and keeps those channels open for a while, refreshing them occasionally, but when told to do so, can also maintain permanent connections to other nodes, i.e. Cloudflare, and can receive requests over those connections.

In fact, the whole premise/idea of IPFS is to have nodes distributed across all kinds of devices and browsers in order to allow peering and distribution via IPFS to any application, anywhere. That is why you have apps that create IPFS nodes for mobile phones - which also don’t have public IP addresses, typically.

Using IPFS properly means storing/retrieving data LOCALLY.

The benefits are in speed of operation, independence from current bandwidth situation, implicit caching of data, reduction of transmitting the same things repeatedly and so on and so forth.

Please see again the original post, specifically the Considerations for integration.

We should ALWAYS prefer local first, in an order of

  • same network (if configured)
  • same device (if configured)
  • same application (default)

and then fall back on the service you developed as the last resort only.

The only exception to this is when the data is already in the cloud because we do not want to download in order to then store into IPFS, which will then upload it again.

It does not require configuration, except for potential speed gains.

There are always plenty of nodes happy to be connected to you. My browser has hundreds of nodes connected world wide right now. As to the likelihood of Cloudflare as a business no longer wanting to participate in the IPFS community, I guess that is possible. But it also seems rather unlikely, especially their moves towards offering storage now:

This is a near perfect permanent storage layer for IPFS, and Cloudflare’s business is as a CDN.

So it would seem odd that they would suddenly stop distributing IPFS when this is where a lot of the innovation is happening and they themselves have been pretty early and involved in this, from the looks of it.

But even if they did: It would not invalidate any of the things discussed here.

Once data is in IPFS, we can pin it anywhere, our clients will always be able to retrieve it from IPFS over their local nodes, and the Vereign IPFS service will keep it pinned for as long as required. And even if we were to use Cloudflare for our own pinning service, then we could switch to another one, or build one ourselves should they ever become either hostile or no longer willing to support IPFS.

That’s the beauty of a heterogeneous, growing ecosystem of providers.

I would start this all by:

  • having a Vereign IPFS node and pinning service
    • this node should be configured to maintain permanent connections to Cloudflare and ipfs.io, at ieast, perhaps some others as well. This is pure configuration, so easily updated
  • having all clients incorporate local IPFS nodes configured to permanently peer with
    • the Vereign IPFS node
    • the Cloudflare IPFS service

and then have local clients request data in parallel from

  • local node
  • Cloudflare

and only fall back on the Vereign IPFS node if neither responds in a reasonable time. That way we protect the bandwidth in the data centre, and use Cloudflare as much as possible.

You are right that for this first step of feature development, local nodes are less useful for data retrieval, at least in the first iteration.

But please keep in mind that this is only the first step and our chance to try out a core component of what we want to do with Mailtree where ttachments are not the only things that will go into IPFS, the data required to verify the seals, and the read receipts will also be in IPFS.

So this is our chance to experiment with local IPFS node integration and usage so we don’t need to take that step once we move to Mailtree and will have a lot more moving parts.

As for the web verification app: Since it is not a persistent application used repeatedly, adding an IPFS node does not seem to add any benefit right now, I agree. Here I would probably default to Cloudflare, and fall back on the Vereign IPFS service.

You need the IPFS CID in order to generate the SEAL.

So there is no way to generate the Seal first and wait for attachments to be asynchronously uploaded afterwards / while you are sending the mail. You are always blocked on upload to IPFS.

Which is why writing speed for attachments is crucial for the user experience. Local write will always be faster than network. And IPFS has the special property that we are getting the correct, permanent CID immediately so we can generate the SEAL right away and send in a matter of seconds – regardless of whether the attachments have already been uploaded / synchronised.

Does this all make more sense now?

Ok, sounds good, let’s try it and see if it can work!

I still can’t understand how they do it with the local node and I’ll try to find more info on the internet. Unfortunately I also cannot find the forum post or internet page where I read about the need to expose ipfs node’s 4001 to the internet in order for other nodes to accept content announcements. I’ll post it here if I manage to find it again.

My assumptions that the node should act as a server with 4001 open come from the post that I read, plus my experiments with the local node I have. I upload content, announce that content to the network and later I was not able to fetch the content from anywhere on the network except my own node - neither Cloudfrare nor ipfs.io worked to return the data after I waited and retried many times.

I think we don’t need to wait for uploading the content to know what the CID is. There are JS (and Go too) libraries which can calculate the CID before any uploading is initiated just by hashing the content. For a 10-20 megabytes attachment that should be some hundred milliseconds operation on the client side, so I assume it won’t be a problem.

I think we can make the client calculate the CID, put it in the SEAL and later upload the content. I think this is also how nodes validate that they have received the correct content, because the network is untrusted (or let’s say trustless). When we request content by CID from other nodes, we don’t know if the bytes which they return are the correct content that we need, so the client calculates the CID from the received bytes to validate that they match the CID that it has requested. So this operation should be relatively cheap to perform and independent of the uploading itself.

Are you sure that you have not uploaded a file which happens to also be uploaded by some other node in the network, and that’s the reason that you were able to fetch it? Is it a truly unique PDF file which only you have?

IPFS has a mix of nodes that have open ports, these are the ones that can be discovered, and those that do not. The ones that do not have ports that can be discovered will connect to the ones that have ports which can be discovered and they will all sync data amongst them.

But the data path can be convoluted and complex if you run a local node that does not have an exposed port, given that it will opportunistically connect to nodes that have exposed ports based on what it finds – and if someone is looking for data on it, that request needs to reach a node that just happens to be connected to the one that you are hosting without exposed port.

In my experience it can take a while for data to become available on any random node if you run a node without exposed ports. Unless you have stable / persistent peering set up for your local node, that is.

It always shows up eventually but the time cannot be planned.

But if you set up constant peering, e.g. to Cloudflare, things get really fast. Ideally you always maintain constant peering to the nodes you would be using for data retrieval or pinning… as those are then just one hop away from the data you are looking for.

You had your IPFS node set up to connect to the global IPFS network (not a private network, and not network limited) and enabled constant peering / connection to Cloudflare and ipfs.io?

That is useful, but begs the question: Why take on the hard task of asynchronous data synchronisation including things like disconnects and resumes when IPFS is dedicated and optimised for that task?

Not to mention: IPFS also provides a local caching layer for this kind of data when connectivity is (temporarily) interrupted, allowing us to process data and show results even while offline or with limited connectivity.

Ultimately it is about using the right tool in the right way. The right way of using IPFS is via distributed nodes for the clients, and larger, port accessible nodes for sharing, distribution and persistence/pinning.

This is also a matter of redundancy and resilience. By using the local IPFS nodes, mail sending and sealing will always work as expected, even if the Vereign IPFS service is temporarily having issues, and no data would be lost. And once we are moving to Mailtree, we should be able to build the entire system to no longer have single points of failure.

Uhm… isn’t that the whole point of Content Addressable Storage?

The only way you could realistically expect that data NOT to be the data we expect would be if there was a hash collision. Which seems pretty unlikely

And if sha256 were to be compromised, IPFS could update seamlessly:

So if it has the correct address (= the correct hash) then it is with near certainty the right file.

Yes.

Here is another one, served straight from my Brave browsers local IPFS node:

https://cloudflare-ipfs.com/ipfs/QmS4vvhsVvkubbyKq8p5YiPwRHFciby5hPeMgiZeuTN9Sg?filename=facebook-down.png

See how it actually got cached and even got filled into this thread?

Here is my peering section right now:

	"Peering": {
	"Peers": [
		{
			"Addrs": [
				"/ip6/2606:4700:60::6/tcp/4009",
				"/ip4/172.65.0.13/tcp/4009"
			],
			"ID": "QmcfgsJsMtx6qJb74akCw1M24X1zFwgGo11h1cuhwQjtJP"
		},
		{
			"Addrs": [
				"/dns/cluster0.fsn.dwebops.pub"
			],
			"ID": "QmUEMvxS2e7iDrereVYc5SWPauXPyNwxcy9BXZrC1QTcHE"
		},
		{
			"Addrs": [
				"/dns/cluster1.fsn.dwebops.pub"
			],
			"ID": "QmNSYxZAiJHeLdkBg38roksAR9So7Y5eojks1yjEcUtZ7i"
		},
		{
			"Addrs": [
				"/dns/cluster2.fsn.dwebops.pub"
			],
			"ID": "QmUd6zHcbkbcs7SMxwLs48qZVX3vpcM8errYS7xEczwRMA"
		},
		{
			"Addrs": [
				"/dns/cluster3.fsn.dwebops.pub"
			],
			"ID": "QmbVWZQhCGrS7DhgLqWbgvdmKN7JueKCREVanfnVpgyq8x"
		},
		{
			"Addrs": [
				"/dns/cluster4.fsn.dwebops.pub"
			],
			"ID": "QmdnXwLrC8p1ueiq2Qya8joNvk3TVVDAut7PrikmZwubtR"
		},
		{
			"Addrs": [
				"/dns4/nft-storage-am6.nft.dwebops.net/tcp/18402"
			],
			"ID": "12D3KooWCRscMgHgEo3ojm8ovzheydpvTEqsDtq7Vby38cMHrYjt"
		},
		{
			"Addrs": [
				"/dns4/nft-storage-dc13.nft.dwebops.net/tcp/18402"
			],
			"ID": "12D3KooWQtpvNvUYFzAo1cRYkydgk15JrMSHp6B6oujqgYSnvsVm"
		},
		{
			"Addrs": [
				"/dns4/nft-storage-sv15.nft.dwebops.net/tcp/18402"
			],
			"ID": "12D3KooWQcgCwNCTYkyLXXQSZuL5ry1TzpM8PRe9dKddfsk1BxXZ"
		},
		{
			"Addrs": [
				"/ip4/104.210.43.77"
			],
			"ID": "QmR69wtWUMm1TWnmuD4JqC1TWLZcc8iR2KrTenfZZbiztd"
		}
	]
},

and also I have set

	"Reprovider": {
	"Interval": "5m",
	"Strategy": "all"
},

so it re-announces the availability of data every 5 minutes, and not just every 12 hours.

See

for more information.

1 Like

Thank you very much, this is all very helpful and I’ll use it in my node.

About the local nodes, are we expecting the users to manually run a local node in the browser or we want to embed a node in the extension, so that every user will have an IPFS node automatically?

As for the issue with the CID, I agree that hash collisions is not a problem we should discuss.

What I wrote intended to illustrate that generating the CID at any point in time, by anyone who has the content, is a relatively cheap (almost instant) operation.

And the example was that clients should (and I’m sure in the IPFS implementations they do by default) verify the content they receive by hashing it after it arrives. It’s an open network and no one can know what a piece of software on a remote computer, will send back in response to a /get/cid request. The only way for the client to know if the content is valid is to hash it and compare it against the CID it has requested in the first place.

Wonderful. :heart_eyes:

Yes.

See start of the thread. Our add-ins should bundle

unless there is a better alternative that is more suitable to our goals – if you look at Brave, they are bundling

into the browser itself, including some default configuration.

That is exactly what we should be doing, including providing a sane default configuration to make IPFS work well for our use case. Because then we know we have a sane IPFS with sensible default locally.

I see. But I would also guess that IPFS does this by default.

So I would first want to verify whether this is already happening, so we don’t do it twice. Inexpensive or not, duplication of effort seems pointless.

1 Like