World domination: Step 1: Reddit

AS part of our desire for world domination for Direct Connect, the (sub)Reddit DirectConnect has been created!. The intent is to have the site act as an aggregate of Direct Connect links and posts, as well as allowing people to use a well-known source of information for their DC content as well.

Go there and write a post!

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

ADC Recommendations

A while back (a really long time ago, it appears), I started the document ADC Recommendations. The intent is to create a document that can be reviewed for best-practices, common implementations and other useful information that need not be in the official specification(s).

Also, my intent was to have the document be more frequently updated (once done), so that it can quickly reference the latest software, so as to not having to update versions for the specifications, for simply guidance.

If you want to add more or revise the existing content, leave a comment below or go to the ADCPortal forum post.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Vista SP1 loses MS support and thus DC++ support

As of July 12, 2011. DC++ will likely continue to run fine, but one might not obtain support if using Vista SP1.

bzip2 remains optimal for filelists

Several open-source file compressors are available. Currently, DC++ uses bzip2 with the default 900KiB block size, corresponding to bzip2 -9. Potential free alternatives include other bzip2 block sizes, zlib, xz, XMill, and FreeArc. I benchmarked the compression times and compression efficiencies of each alternative across a test set of 441 filelists; this ADCPortal thread has methodological details and test scripts.

Across the 9 zlib settings, 9 bzip2 settings, 10 XMill settings, 16 xz settings, and 21 FreeArc settings tested, many were both slower than and resulted in larger files than another than those of another compressor/setting pair. I started by filtering these out. Thus, all compressor/setting pairs charted have some advantage over all other such compressor/setting pairs.

Time vs size efficiency scatterplot (all filelists)

Size ratios (all filelists)Time ratios (all filelists)

Because the ratios are relative to the status quo (bzip2 -9 at (1,1) in the scatter plot), all compressor/setting pairs to the left of the points around (1,1) in the scatter plot create larger filelists more quickly than bzip2 -9. The few points to the right, mostly far to the right, are slower but result in slightly smaller files, up to 4% smaller. After filtering out the more space-efficient but 4 times slower (and and dependent on hundreds of MiB of RAM to compress) FreeArc settings on the high-end of the x axis, corresponding to FreeArc settings -5 and above, one is left with a more compact scatter plot:

Time vs size efficiency scatterplot (all filelists, sans high-FreeArc)This allows one to distinguish between all the other points more easily, but demonstrates that across the set of all tested filelists, the status quo at (1, 1) beats all others in compression efficiency. To the left lie mostly zlib and XMill variations which preprocess the XML filelist input then feed it to zlib, as well as a couple of low FreeArc settings. This distribution of compressor time and space efficiency holds true with minor adjustments when dividing filelists into quintiles by raw list size as well:

Time vs size efficiency scatterplot (quintiles)Size ratios (quintiles)Time ratios (quintiles)Those three points clustered around (4.4, 0.97) are, as before, high FreeArc settings. Though this was obscured by the previous charts, they only become advantageous at all with the largest quintile of filelists, those above 3.4MiB. For at least 80% of the 441 tested filelists, they provide negligible advantage of any sort. Because those large lists for which the 4x speed penalty would bite most are also the only filelists they would provide much size advantage to (e.g. 5% of a 5MiB-bzip2’ed list would save 256KiB, but even 5% of a 1MiB-bzip2’d list saves only 51KiB), the more CPU-intensive FreeArc settings are uselessly wasteful for DC filelists. Removing them creates a more readable scatter plot:

Time vs size efficiency scatterplot (quintiles, sans high FreeArc settings) The bzip2 -9 status quo resides, as stated, at (1,1) by construction of ratios. At best, of the speed-wise feasible alternative compressors/settings, one might extract perhaps a 2-3% compressed list size reduction, at the cost of 20% slower compression; having to maintain two compressors indefinitely and thus in practice having to compress twice; and adopting nonstandard or abandoned (XMill), albeit still open-source, compressors.

bzip2 -9, the existing status quo, has proven a remarkably durable compromise between speed and compressed size and even given an opportunity to select a compression algorithm without regard for backwards compatibility, I would choose it again for a DC-like protocol. Combined with the high costs of transitioning to anything else, bzip2 -9 thus remains an the optimal filelist compression algorithm for DC.

Interview series: poy

This is one part of a blog series where influential people in Direct Connect are interviewed about the past, present and future state of Direct Connect.


poy is a developer known for his influence in the development of DC++ and ADCH++. He raised code base standards to a new high, made the UI library transition (in DC++) end up well, have been one of the main developers of DC++ and ADCH++, and more.

What follows are the questions I asked and the answers he gave.

  • What made and when did you start with Direct Connect?

About 8 years ago, back when all I knew of P2P was eMule, KaZaA, Limewire, emerging torrents and other faceless global file sharing solutions, a friend of mine told me about this great gem he had just found, where getting into servers depends upon how well and how much you share yourself.

  • When did you start with the development of DC++?

Looking into the changelog, my first patches landed in version 0.695, released 5 years ago. Before that, I had been spending most of my C++ coding time modding my own version of DC++.

  • What made you interested in the development of DC and DC++?

DC has a unique set of communities that care about sharing in a sense that I fully appreciate. It is therefore always a pleasure to look into its inner workings.

Being reverse-engineered, the initial DC protocol has always left me with a slightly bitter feeling, thus getting me to look for a better, more future-proof alternative. Fortunately, by the time I got interested in that, ADC was already on its way up. It was however far from perfect, especially with regards to the lack of hub programs available. This has led to a constant state of motion, ideas, achievements which quickly appealed to me.

For DC++ specifically, I enjoy its use of many modern programming constructs, its large user base and the fact that there are still so many things to do with it.

  • You are one of the main developers for DC++; What are your goals and ideas?

One of my primary goals is and has always been to be able to replace the mod I used to develop a while ago, which included various features I have yet to see in any other DC client so far. This includes a more evolved chatting facility, configuration for each hub or each hub group, various UI fanciness…

I have many ideas but mostly smaller, short-term ones, such as (just dropping some from my todo list) ways to filter lists with Regular expressions, using the binary and/or city databases of GeoIP, adding right-click accelerators similar to those in Internet Explorer

One untold goal of the UI library that DC++ is using (formerly SmartWin++, now DWT for “DC++ Widget Toolkit”) is to support more than just Windows and to be released separately. Knowing of other applications than DC++ that are able to use DWT would be quite an accomplishment.

Another ongoing goal is to keep DC++ on top of the latest changes in C++ standards. The C++ committee has been quite productive these last few years.

  • You are also involved with the development of ADCH++; What are your goals and ideas?

My ultimate goal for ADCH++ would be a hub that anyone can easily configure, yet expert users can still enjoy. I believe the latter part to be close to reality right now; the former, however, is not quite so according to the feedback we are getting. A GUI would be most welcome. One was in development at one point but it seems to have halted; I hope it gets picked up again.

Aside from that, I consider ADCH++ to be quite complete; its author has made a wonderful job with it. A few features may not yet be available but they are all doable with plugins / scripts, so that doesn’t really fall on ADCH++ itself.

  • Do you have any goals and ideas with the further expansion of NMDC and/or ADC?

I hope ADC can provide a fully secure environment; it is already possible to encrypt ADC traffic, but that doesn’t mean it is completely secure.

I hope ADC can reach a state comparable to other Internet protocols such as HTTP, FTP, etc.

I have been wondering about an ADC extension idea that I would call HFAV, where clients would send the hub addresses they know about to hubs, which in turn would poll all the addresses they have gathered, regularly ping them and dispatch the results to any client that requests them.

  • What other DC projects are you involved in, and what are your goals and ideas with those?

I have written DCBouncer mostly for my own use: it stays connected on a server I own and simulates a presence on some hubs, gathering chat messages (including private ones); then when i log back in, it forwards them to me. Usage scenarios are multiple: logging, hiding one’s actual IP, perhaps in the future even share directly from the server…

  • What do you think will attract more users of DC in general? Would that be desirable?

Internet ads praising the DC idea would, in my opinion, be a great way to attract users. Another is to troll P2P forums.

The main reason I like DC is its elitist philosophy: the better and the faster you share, the more likely you are to get invited to better hubs with users with a better sharing quality. If I wanted some kind of global server where anyone could enter, I would prefer the programs previously mentioned. That is the reason I am not a fan of hubs that strive for a maximum amount of users and disregard the community aspect; and especially why i dislike the DHT feature that some clients have implemented as DHT is taking that idea of faceless hubs to an extreme.

The stuff one can find on global P2P networks / faceless hubs is very different than what is available in elitist DC communities. I believe luring users of the former category into DC would not necessarily be beneficial; it would be more interesting to strive for those who are likely to contribute to the very unique DC sharing spirit.

  • Would you change anything with DC, DC++, ADC etc if you could?

All the ideas I have had so far have been quite realistic; I guess I tend to unconsciously discard those I couldn’t be able to implement myself.

  • How much time do you spend on DC development?

This depends a lot on real life, other Internet groups I frequent and the motivation I have to accomplish a particular task. At times I have spent 6 hours straight; other times, just 15 minutes to do an otherwise awesome change. I try to at least check what’s up once a day or once a week on busy weeks.

  • What would cause you to spend more time with development?

An awesome idea that would have many code ramifications; or a crazily hard-to-find bug.

  • What was the most difficult thing you have done with DC?

I would be tempted to mention the recent crash logger, which led me to fully read the libdwarf, DWARF 2 and PE/COFF documentations. This was a unique achievement so I had no example on how to go about it at first. But in retrospect, this wasn’t quite that difficult.

The hardest thing I can remember is having to track down bug 590651. It was a leak of a graphic object that resulted in all debug reports being useless. We ended up having to ask some testers to run several versions of the program at the same time to try to figure out a scheme to the crash, until one postulated that they were related to the amounts of hub disconnects received by the program. That wasn’t much to go on from but I eventually figured it may have been related to tab icon changes on disconnects. Although the fix was trivial, pinpointing the bug was quite the hardship.

  • What was the most important thing you have done with DC?

I am quite proud of the DC++ window manager, which restores the session to the way it was before the program was closed. This goes with the menu of recent windows in the toolbar.

  • What was the most fun thing you have done with DC?

Hard to say, it’s always fun and a unique sensation of accomplishment that only a developer can know of. :) But I guess ADCH++-PtokaX is pretty fun in its own ironic way.


Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Interview series: cologic

This is one part of a blog series where influential people in Direct Connect are interviewed about the past, present and future state of Direct Connect.


cologic is a developer known for his influence in the development of BCDC++, DC++ and ADC. He initiated the creation of BCDC++, introduction of hashes, Lua and more.

What follows are the questions I asked and the answers he gave.

  • What and when made you start with Direct Connect?

I started using Direct Connect in 2002 primarily out of curiosity.

  • When did you start with the development of DC++?

I first made BCDC++ and contributed to DC++ in 2003.

  • What made you interested in the development of DC and DC++?

I found using DC, especially through DC++, interesting. However, due to DC++’s immaturity in 2003, it lacked useful features. The ones I wanted I had to find or write code for, creating interest in the development of DC and DC++.

  • What prompted BCDC++?

(1) Upload limiting, though important to rendering DC usable, was commonly regarded as akin to cheating or faking. Thus, I involved myself in DC development to obtain a client providing it;

(2) Because many hubs allowed NMDC but not DC++, I desired NMDC client emulation. This was gone no later than 2005, by which time DC++ was near-ubiquitously allowed. However, that was first complemented then supplanted by…

(3) Description tags, conceded to persuade hub operators to stop banning DC++ per item 2, leaked excessive information and provided too much power to hubowners. They obviously weren’t disappearing from DC++, so I removed tags from BCDC++ for some years before description tag faking became more viable than outright tag removal. (BCDC++ has neither emulated clients nor faked description tags for more than three years.);

(4) It took surprisingly long for DC++ to encode $ and |, used in the NMDC protocol, such that one could use them in chat. I fixed that in BCDC++ before DC++ picked up a workaround;

(5) Lua scripting looked quite useful on the hub side and I wanted similar capabilities in a client. I thus incorporated a Lua scripting interface into BCDC++.

  • What were your goals with BCDC++? Do you have any goals now for it? Do you see a future still for it?

I incorporated features both that hubowner-dominated DC politics kept out of the main clients and to satisfy niche desires of mine too obscure to justify including in a more mainstream client.

I plan now to merge the remaining important features from BCDC++ into DC++ and deprecate it. All recent major BCDC++ changes, such as NAT traversal, were promptly sent upstream to DC++. The remaining blocking feature is the script support, waiting upon a DC++ plugin API.

  • You have been involved with the development of ADC. Do you have any goals and ideas?

Most broadly, I did and continue to push to remove NMDC’s flaws. These include, in reverse order of importance:

(1) plaintext password transmission. Combined with (2), this means that NMDC has no security against even passive eavesdroppers. ADC uses a challenge-response login protocol to fix this;

(2) lack of encrypted connection support. This continues to render NMDC vulnerable to even passive ISP interception. ADC supports SSL in the form of ADCS;

(3) in two symptoms of essentially the same issue, allowing trivial nick-faking and preventing multiple shares per client per port from functioning. ADC adds additional client-client session authentication to avoid these issues;

(4) finally, its lack of extensibility. People hijacked commands such as $SR and $To to implement CTCP-like messaging due to hubs routing only a small, fixed set of commands from one user to another. Towards this I supported the separation or message type from message content.

  • What do you think will attract more users of DC in general? Would that be desirable?

I could not confidently predict what would attract more DC users in with specifics. Generically, reducing barriers of entry I would expect to work. Increased user volume seems necessary to keep DC above critical mass. Network effects dominate – I tend to believe that the value of the network is roughly proportional to the square of the number of users, so each gain or loss of users matters substantially.

  • Would you change anything with DC, DC++, ADC etc if you could?

DC software I see as implementing a chat and filesharing system which supports indefinitely sized, browseable individual shares which allow people to organize and share personal collections; allow for private networks; tend to operate on a small scale by modern P2P standards; and support social interactions with the same set of users sharing files.

I support DC remaining primarily in this role because breaking those assumptions tends to result in other already-extant P2P software being highly competitive with the best DC software, whereas within that niche, DC clients, especially DC++ are the best solution I’ve found. Therefore, the DC topology broadly I find adequate.

Taking DC as given, DC++ implements the concept well. Its primary deficiencies are lack of plugin and scripting support and lack of multiple share support. I’m largely satisfied, however.

ADC, low usage levels aside, has worked out well as a protocol. I’d fix a couple of things though:

(1) the login password protocol is weak, not using secure cryptographic hash-based constructions. Standard solutions exist but aren’t backwards compatible and it’s unclear the current specification justifies breaking such compatibility. Further mitigating its impact, it’s irrelevant in ADCS. Still, it’s an obvious mistake;

(2) experience has belied the conceit that global identification works. It’s caused problems in private message routing and CTM routing, both revealing that not all hubs on which a given pair of users both are connected are functionally identical nor should they be treated as such. A private message is less private on some hubs than others, CTMs are blocked by some hubs but not others, and there’s no standard way to communicate these distinctions to a client. Untrustworthy heuristics emerged to work around a false assumption of substantial similarity between hubs that should have never been present.

  • How much time do you spend on DC development nowadays?

Not much. I’ve written a couple of sizable features in the last few months – writing NAT traversal and rewriting bandwidth limiting (with BigMuscle’s help) were short, intense bursts – but it’s otherwise in maintenance mode.

  • What would cause you to spend more time with development?

Finding something intriguing and novel best done within DC++ rather than on its own.

  • What are your short and long term goals with DC that you would want to achieve?

I’m watching the NIST SHA3 competition and have proposed a transition from the already-weakened Tiger to SHA3-based hashing.

I’d like to use ADCS to encourage more encryption use on the Internet. When governments unabashedly place dragnets in key routing nodes and shunt off all the traffic they can get – last year, the NSA monitored 1.7 billion emails per day, for example – and ISPs openly run deep packet inspection, encryption suggests itself.

  • What is the most difficult thing you have done with DC?

Learning and working around sparsely-documented Windows RichEdit control quirks.

  • What is the most important thing you have done with DC?

Introducing Tiger Tree Hashing. Their presence has fundamentally and positively altered DC. They’ve cleared hubs of the identically-named but undetectably and differently corrupt files that used to fill them. Further, TTHes allow for workable multisource downloading. Even single-source downloading is now actually reliably between users because it let old hacks such as rollback be removed. (Auto)search for alternate sources doesn’t depend on filename and thus works more reliably. DC is a much more reliable and usable network because it relies on TTHes.

  • What is the most fun thing you have done with DC?

My favorite episode was the short-lived bier.lua fallout; “bier? ja lekker! :)” might trigger memories. Also amusing has been the continued reaction to displaying users’ IPs in the userlist, search frame, and transfer frame.


Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Base32 encoding in ADC

The ADC Protocol specification is well-defined for the most part, but there is a lack of information on Base32 encoded strings. Hopefully this post helps clear up a few things relating to them.

The specification for the Base32 encoding is defined in RFC 4648. The method to use when converting bytes into a Base32 encoded string is to take the first 40 bits of data, divide them up into 8 groups of 5, then convert each group of 5 bits to it’s character representation (using the Base32 lookup table). This is to be done repeatedly until there is no more data left to encode. In the case where the number of total bits is not divisible by 40, there will be a shortage of bits in the final group of 40. In this case, the last group of 5 bits there is data for is to be padded with 0’s (if needed) and the remainder of the 8 characters in the group are set to a padding character (‘=’).

The padding character can be excluded but only if the specification of the standard referring to the RFC explicitly states so. When the padding character is omitted, taking the input in 40 bit chunks becomes unnecessary. Simply take the input 5 bits at a time until the end of the data. If the number of bits in the input doesn’t divide by 5, pad the last set of 5 with 0’s.

When converting a Base32 encoded string to raw bytes of data, generate a binary representation of the encoded string (using the Base32 lookup table), then take every 8 bits and store them in a byte. If the length of the Base32 encoded string multiplied by 5 doesn’t divide evenly the extra bits are discarded.

The only clue as to if Base32 encoded strings should be padded or not is the line ‘base32_character ::= simple_alpha | [2-7]’. This (in my mind) does not explicitly state that padding should be omitted, it simply states what Base32 characters can be and leaves the interpretation up to developers. It’s a small leap, but enough that it could make people look through alternate sources for confirmation.

A quick recap:
When converting from raw bytes to Base32, pad the extra bits with 0’s to make the final character and omit the padding character(s). When converting from Base32 to raw bytes, discard the extra bits.

This information was learned through searching through the Base32 specification, DC++ source code, Googling, guesswork and trial-and-error. A few additional footnotes in the ADC protocol specification would go a long way for developers who choose to implement an ADC-compliant application from scratch, without using the DC++ core (which is developed by the author of the ADC protocol).

This post was written by pR0Ps, the author of NetChatLink.

If you have something you want to post, drop a note in the suggestion box or mail me.

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Youtube channel; The Direct Connect Network

Recently, I started the Youtube channel The Direct Connect Network.

The intent is to provide videos with tutorials and introduction to different software within the DC network.

There aren’t many videos as of yet, but the idea is that any DC-related video can make its way to the channel.

If you have anything you want to cover in a video, post here on make a comment in any of the videos posted.

(It had already been noted elsewhere, but not here, hence this post…)

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Syntax diagram of ADC BNF

I went ahead and generated some syntax diagrams for the ADC BNF at http://www-cgi.uni-regensburg.de/~brf09510/syntax.html.

I used the W3C-BNF since that is what the ADC specification (almost) states its BNF in. The following is the input;

...

[1] message ::= message_body? eol
[2] message_body ::= (b_message_header | cih_message_header | de_message_header | f_message_header | u_message_header | message_header)
(separator positional_parameter)* (separator named_parameter)*
[3] b_message_header ::= 'B' command_name separator my_sid
[4] cih_message_header ::= ('C' | 'I' | 'H') command_name
[5] de_message_header ::= ('D' | 'E') command_name separator my_sid separator target_sid
[6] f_message_header ::= 'F' command_name separator my_sid separator (('+'|'-') feature_name)+
[7] u_message_header ::= 'U' command_name separator my_cid
[8] command_name ::= simple_alpha simple_alphanum simple_alphanum
[9] positional_parameter ::= parameter_value
[10] named_parameter ::= parameter_name parameter_value?
[11] parameter_name ::= simple_alpha simple_alphanum
[12] parameter_value ::= escaped_letter+
[13] target_sid ::= encoded_sid
[14] my_sid ::= encoded_sid
[15] encoded_sid ::= base32_character base32_character base32_character base32_character
[16] my_cid ::= encoded_cid
[17] encoded_cid ::= base32_character+
[18] base32_character ::= simple_alpha | [2-7]
[19] feature_name ::= simple_alpha simple_alphanum simple_alphanum simple_alphanum
[20] escaped_letter ::= [^ \#x0a] | escape 's' | escape 'n' | escape escape
[21] escape ::= '\\'
[22] simple_alpha ::= [A-Z]
[23] simple_alphanum ::= [A-Z0-9]
[24] eol ::= #x0a
[25] separator ::= ' '

...

(Note that the W3C-BNF doesn’t support ‘{3}’ etc so I had to extend those instances.)

The following is the output;

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Copyright notice for Tiger implementations

I and Jacek began a while back discussing the Tiger implementation in DC++. The implementation is a C++ port of the original C code available on the official Tiger website; http://www.cs.technion.ac.il/~biham/Reports/Tiger/

What was noticable about the C code was that there was no license attached; this means that the implementation fall under default copyright laws (in this case, US laws). As such, any type of modification or derivative (which the C++ implementation might be considered as) is most likely not directly allowed.

I sent an e-mail to the authors to rectify the situation and make sure there is sound lawful ground for DC++, other derivatives and users of the C++ code.

The following is what I sent to Eli Biham, one of the authors;

Under what license is your C implementation of the Tiger (http://www.cs.technion.ac.il/~biham/Reports/Tiger/) algorithm? The source code doesn’t state any license explicitly, nor does the Tiger main page. As it stands now, it is not possible to use your implementation, as is, in an application without getting explicit permission from you.

Biham responded;

Dear Fredrik,

I hereby allow you to use it, provided it will compute Tiger, and state
the names of the authors in it.

Clearly, the usual disclaimers hold, e.g., that it’s use will be legal,
and that it will not be exported to countries banned by law, that the
authors will not be responsible for the code, your software, nor
anything else.

Regards,

Eli

In an effort to create a cleaner phrasing that would suit source code, I rephrased and added the following to the DC++ source (in TigerHash.cpp/h, in DC++ 0.780);

/*
* The Tiger algorithm was written by Eli Biham and Ross Anderson and
* is available on the official Tiger algorithm page .
* The below Tiger implementation is a C++ version of their original C code.
* Permission was granted by Eli Biham to use with the following conditions;
* a) This note must be retained.
* b) The algorithm must correctly compute Tiger.
* c) The algorithm’s use must be legal.
* d) The algorithm may not be exported to countries banned by law.
* e) The authors of the C code are not responsible of this use of the code,
* the software or anything else.
*/

If you are using the C implementation or a derivate (including DC++’s implementation), you must include a similar notice.

Feel free to use my phrasing or write your own (adherring to Biham’s restrictions).

Don’t forget that you can make topic suggestions for blog posts in our “Blog Topic Suggestion Box!”

Design a site like this with WordPress.com
Get started