PtokaX forum

Development Section => Your Developing Problems => Topic started by: scsigirl on 20 July, 2005, 03:10:46

Title: How to read a .DcLst file
Post by: scsigirl on 20 July, 2005, 03:10:46
I hope I'm posting this in the right place this time...

Please let me know if there is any way to uncompress or convert the .DcLst file to ascii. .bz2 files unarchive ok.

The reason behind my request:

In my audiobook hub, most users are not able to keep their entire collection online, so many of us maintain a list of what we have archived on cd+dvd and export it as an ascii file which is then shared.

These ascii lists have all been collected into a folder on the host machine to create the catalog.

I have written a script that allows users to search this catalog for any combination of keywords and the results are displayed in a window, each line tagged with the user that owns the audiobook.

Members can then PM the owner and request that the desired book be shared for downloading.

Users also have the ability to upload an updated list at any time thru the script.

In addition to searching the Archive Catalog, it has the ability to search a catalog of DC lists of users that are currently offline. So far I've only been able to add the .bz2 lists and none of the .DcLst files because I haven't been able to uncompress or convert them to ascii.

Any help would be greatly appreciated.

Post by: scsigirl on 20 July, 2005, 04:12:52
I would love to, if I knew a way to force my client to download just bz2 files, but I end up with a mix of bz2 and dclst files, depending on what clients I am downloading the lists from.

I want to convert all of them to ascii files and dump them all in a folder that the script can search.
Post by: Pothead on 20 July, 2005, 11:46:34
.DcLst is what NMDC sends (i think).
It is in the "Huffman" encoding. :)
Post by: scsigirl on 20 July, 2005, 13:30:56
I did try the client you suggested (along with a handful of others) and got .xml.bz2 files, which have a lot more supurfulous info that the script would have to trim off to keep it readable, so I have mixed feelings about that solution, but it is possible if all else fails.

I'd still rather find a way to decode the .dclst files.

Pothead, do you know what program can decode them?

I did like your other suggestion to execute a timed task to extract the files automatically though, very nice.

That brings up another question....

I'm using execute already to start a dc++ client, after a user has queued an upload of their list, the script writes an entry to the QUEUE.XML file for a specified client on the host machine, then fires it up.  The client downloads the list, but at that point I need to manually close it, because any further additions to the QUEUE.XML file are not seen until the client restarts.  Do you know of any clients that will exit when the download queue is empty?  or reread the QUEUE.XML file while running?

Tks for all your help
Post by: scsigirl on 20 July, 2005, 14:34:31
as I continue to search the net for huffman/lua references, I'm getting this gnawing feeling that I'll have to resort to writing my own decoder in lua.  I did find some C encode+decode routines which will be a good reference, but this is probably my least desirable option....

Does an HE3 lua decoder already exist?  I wasn't able to find anything here when I did a search.

Tks, scsigirl
Post by: scsigirl on 21 July, 2005, 23:01:51
I think you have a valid point, and it only took a few lines of code to strip out the unwanted text from the XML file before printing matching records.

As it stands now, I have collected all filelists of registered members so that the script can search them and report matches for users that may be offline at the time.

But it's growing unweildy very quickly at 355mb in 187 files.
I'll have to come up with a way to break it up so the script doesn't stall out the hub while scanning all the files.

Were you also implying that DCDM can search saved filelists ?
if so please tell me more...

Thank you for all your suggestions,
Post by: scsigirl on 21 July, 2005, 23:58:50
I think I'm following you...  but I should clarify a little too..  all members in this hub share audiobooks exclusively, so there is no need to find content, only to search for a specific author or title when a user types it to the script, ie: !search azkaban

Are you saying there is a way to pass this request to the client and have it search previously downloaded filelists as opposed to users currently online?  that's my goal.
Post by: scsigirl on 22 July, 2005, 06:27:09
I follow what you're saying now, but I don't think that will help me in this case.   The goal here is to find either files that are stored offline or files that are kept online by users that are temporarily offline.

In both cases the script is searching previously stored file lists.  Nothing generated by a client at the moment of the search is going to find the content we are looking for.

A specific example here may help, say I'm looking for an older obscure audiobook title that only a couple of users out of a few hundred happen to have it archived on cd and don't bother to share it anymore because there is so little interest in it.  This will tell me who has a copy of it so I can ask them to share it.

Or a title that only a couple of users have shared, but they are not in the hub at the time of the search, this will find that as well.  Then I can pull up their filelist and queue it so it will start the next time they enter the hub.  I do this frequently and very often the files are waiting for me when I return.

I am also sharing the pool of filelists as well, so other users can do the same.

The filelist search is working great now using xml files.  The only drawback is that it's taking 3 minutes to scan 350mb of text.  Now I will be looking at streamlining the code to get better speed out of it.

Thank you again for your feedback, it will keep me thinking.

Post by: scsigirl on 22 July, 2005, 08:29:42
Ok, now I do understand what you are saying about the filelists.

I do have a routine which strips out the file detail lines, since almost all audiobooks are stored in separate folders we only need to preserve the directory entries but I've gotten lazy and haven't been using it.  I'll have to automate more of the pre-processing because it does make a huge difference as you say.

The data that you refer to as not having been loaded, comes from all of the users that maintain their own database of what they have archived offline and can be in any format.  So I will still need to scan those ascii files.

I didn't mean to sound content just because it works.  I think of it as a work in progress, evolving as I go.  There are always new ways to build a better mousetrap eh?

more good food for thought... thank you!