FILE and GET messages

M&B · Post by **M&B** » April 12, 2005, 11:16 pm

I noticed that we're supposed to determine where a FILE is supposed to go by caching the GET message corresponding to it, and then looking the path back up on the return voyage of the file.

However, the only data by which we can look up the cached GET message is the file name and file length. What if there are two concurrent GET messages for the same file with the same file length? There wouldn't really be a way to disambiguate between the two files. We could assume that the files are identical and just serve them to whoever is on top as we get them, like a stack, but that still seems like a risky game.

Could we perhaps have the hop path in the FILE response as well?

jaf656s · Post by **jaf656s** » April 13, 2005, 5:19 pm

This is true, but for this project we are working under the assumption that a filename uniquely identifies a file. If this were not the case, then other identification measures would be necessary as you have stated.

-Jason

Matt · Post by **Matt** » April 13, 2005, 11:16 pm

Well, this isn't about the file so much as it is about the user trying to download it. If we keep a bigger picture, we automatically realize that file identification would be necessary. However, that could also be taken care of in the GET message. In particular, I'm focusing on the FILE message allowing for mix ups. Would it perhaps make sense to include the path in the FILE message as well as the GET message?

If the path were in the FILE message, it would take out the ambiguity caused in identifying files and requestees at the same time as it would remove the need to cache GET messages (which would save a lot of extra work).

If there's a benefit to not including the path in the FILE message, I just missed it and I'll have to thank you for the patience.

Post by **Paco103** » April 14, 2005, 3:05 pm

The only reason the path is included in the search results and therefore get messages is because of the assumption that large majority (probably over 90%) of search hits will never be requested. In the search request (which you already have to implement a lookup table for, so the work should already be done), every thing that goes out is assumed to be coming back, so therefore it makes more sense to store the information locally rather than sending larger packets (which in a large network could be quite large).

You are right that it would be better to include more identifying information in a real world case. If we did that, it would probably use the search ID method. However, in this project we are assuming that file names are unique identifiers, and therefore no further information should be needed. The only way (with this assumption) a mix up could happen is with partial file requests. To handle this, you should probably cache the offset and segment length as well. If the filename, offset, and segment length are the same, then the stream can be assumed to be the same.

Also, if you have 2 clients request the same file through your node, you could fulfill both clients off of the first incoming source stream, and the second one you could disconnect as soon as you parse the header and notice the file stream is the same as the one you are already receiving.