P.S: No offence is intended to documentation writers, nor to any documentations. I love the effort you guys put to writing good, (ahm) easy-to-follow manuals for software ❤️
This was created as a fun side project to learn some RAG. Please do not get mad at me if this doesnt meet your expectations
Do you ever get tired of just reading documentation?
Do you wish for a program with which you can just ask questions and get the answers from?
Well, this piece of majesticity is (probably) just for you!
Glad you asked. FTheDocs is a super advanced documentation querier software, that you can use to get information from the documentation faster and (maybe) more efficiently.
It first embeds all your docs, then, using your query, finds the closest most accurate piece of embedding to the embedding of the query
In short, it's a vector database querying software
- You first stuff the entire documentation (which you should scrape and store into either a .txt or .json file) into FTheDocs
- FTheDocs then builds a 'collection' of those documents after parsing them, while also taking into consideration any settings you have given it (YOU CAN CUSTOMIZE IT)
- Then it presents you into 'asking mode', where you can ask the collection what you want to know from the docs
THATS EASY. Just git clone this repo :)
git clone https://github.com/muaaz-ur-habibi/fthedocs.git
then just go into the directory, run the command
python fthedocs.py --help
to be presented with the help menu. Or just read the documentation below for more details
- Pretty Console UI using Rich library
- Question-Answer style querying system
- Verbose output of whatever process is currently on-going (still working on this)
- Settings to allow the program to be fitted according to your documentation
- Free AND Open-source
When cloning the repo, you also clone a test.txt & a test.json file. This is a scraped version of Beej's C Sockets Guide. This is also the test documents that I used for testing FTheDocs. You can use this to play around with it aswell
python fthedocs.py --file test.txt
this is the most basic way to use FTheDocs. This command will load the test.txt file and present you in asking mode using default settings
The argument `--file` is used for .txt files. To use .json files, use the argument `--json` to specify a .json file
When using JSON, the `--json-path` argument becomes compulsory
This is the key values of the .json file FTheDocs needs to take in order to reach the target text, which it then converts into a list of texts
Think of it like this:
{
'main': {
'key_1': {
'key_2': ["target_text_as_list"],
}
}
In order to reach all the desired texts, in this case ["target_text_as_list"]
, FTheDocs needs to take the path 'main->key_1->key_2'
Naturally, there will be limitations. In this case:
- The path MUST include a 'LIST' parameter, for FTheDocs to iterate over. Now whether that list is a list of dictionaries, strings. It doesnt matter
- Multiple paths in a single run can NOT be specified
It would also be nice to know that, to specify the end of the path aka that 'here are the list of dictionaries' you should add a LIST parameter
Alternatively if you have many lists of lists, and only wish to use one of them, LIST also works like any list (in the sense you can use LIST[0] to specify an element at 1st index)
Too much to swallow? You bet. Go ahead and open the test.json file for me. I'll show you a real example.
In here, you see there is a main dictionary, which has a "css" key, whose value is a LIST. Inside that LIST is another LIST, only one tho. That LIST is a list of dictionaries, with different HTML element properties. But what we are looking for is the "content" key of those dictionaries.
What will be the json path of this file?
Think about it for a second...
That would be "css|LIST[0]|LIST|content"First fthedocs would go into 'css', there it will find a LIST, but we only need the 0th one, so we specified 'LIST[0]'. After that is another LIST, this one containing all the dictionaries, whose key that we need is 'content'
You can also specify a starting and ending point of the document to be added. In normal cases this would be the line limit. This is also a command-line argument.
The syntax goes as: `starting_point:ending_point`
CUSTOMIZING (kinda)
There are quite a few things you can change. Some of them directly impact the results. Others not so much or not at all
- Changing the collection name
- Changing the document I.D name
- Changing the parsing seperator
- Changing the amount of queried results
- Concatenating a set number of documents
- Changing the concatenating character
- Showing these settings when building the collection
- This changes the created collection's name.
- This changes the document I.D starting string
- This changes the splitting criteria for each line. So if this is a '.', each line of the text/json file will be splitted further on each '.'. Can be useful for more seperation
- Changes the amount of results that are returned back to the user, usually in ascending order of close match. NOTE: setting this to anything other than 1 will disable the 'Query around' functionality (explained later on)
- You can also concatenate an integer number of docs together, to create a bigger document. Think of it as the opposite of Parsing Seperator
- Concatenate on a custom character. Eg: 'document 1', 'document 2'. Character is '. ' (spaces will matter). So the concatenated will be 'document 1. document 2'
- Doesnt do much. Just display the settings when building the collection. Just in case you realise you messed up a setting or two
Those were the basics of FTheDocs. Incase you dont understand anything OR encounter an issue/problem, you can always open up an issue, and ill make sure to find some time to respond :)
- No file types supported other than .txt and .json
- Cannot save a previous collection
- Cannot scrape the documentation for you
- Cannot format the documentation for you (in json)
- It isnt an AI so you cant ask it anything
- Cannot give you emotional support (I tried)