AI Powered Bookshelf

Bookshelf is a Generative AI application built as a rudimentary, but fairly capable, RAG implementation written in python. It can use an open source LLM model (running locally or in the cloud) or a GPT model via OpenAI’s API.

  • The application is created using streamlit.
  • I used llama-index for orchestrating the loading of documents into the vector database. Only TokenTextSplitter is currently used. It does not optimize for PDF, html and other formats.
  • ChromaDb is the vector database to store the embedding vectors and metadata of the document nodes.
  • You can use any open source embeddings model from HuggingFace.
  • Bookshelf will automatically use the GPU when creating local embeddings, if the GPU is available on your machine.
  • You can use OpenAI embeddings as well. There is no way to use a specific OpenAI embedding model or configure the parameters yet.
  • Use OpenAI API or any OpenAI compatible LLM API (using LMStudio, Ollama or text-generation-webui) of your choice.
  • There is a live demo on streamlit cloud – https://bookshelf.streamlit.app/
  • The demo allows only OpenAI integration. You can run it locally for accessing Open Source embedding models and LLMs.

Live demo – https://bookshelf.streamlit.app/

You will need your OpenAI api key for the demo.

If you are running it locally, you will have the option of using an Open Source LLM instance via an API Url. In the screenshot, I am using an open source Embedding Model from HuggingFace (sentence-transformers/all-mpnet-base-v2) and The local LLM server at http://localhost:1234/v1

Collections tab shows all collections in the database. It also shows the names of all the files in the selected collection. You can inspect individual chunks for the metadata and text of each chunk. You can delete all contents of the collection (there is no warning).

You can modify the collection name to create a new collection. Multiple files can be uploaded at the same time. You can specify if you want to extract metadata from the file contents. Enabling this option can add significant cost because it employs Extractors which use LLM to generate title, summaries, keywords and questions for each document.

On the Retrieve tab, you can query chunks which are semantically related to your query.

On the Prompt tab, you can prompt your LLM. The context as well as the Prompt Template is editable.

Here is an example of using the context retrieved from chunks in the Vector database to query the LLM.

This inference was performed using Phi3 model running locally on LMStudio.

Code is on Github – https://github.com/ashtewari/bookshelf

Have fun!

KeyNode with Node.js and Microsoft Azure

KeyNode is a application to issue and verify software license keys. Technology stack for KeyNode is Node.js, MongoDB and Microsoft Azure.

I had built this functionality with C9.io (a cloud-based IDE with a built-in source code repository and debugger), mongohq (MongoDB as a service – now part of compose.io) and appfog (Cloud PAAS built on top of CloudFoundry). It used SMTP/gmail to email license files. That was the version I created a couple of years ago to issue tamper-proof signed xml license files for CodeDemo (a code snippet tool for developers, presenters and instructors).

For KeyNode (open source) I switched to a different toolset : Visual Studio Code and Windows Azure, simplified the code to remove signed xml file and open-sourced it on GitHub. Signed xml allowed offline verification in CodeDemo (a Wpf/Desktop app). Removing signed xml requires verification to happen online. I am working on adding the web endpoint for verification of license keys. This version uses SendGrid to email license keys. KeyNode is deployed as a Windows Azure Web App. The Azure Web App is on Continuous Deployment feed from the source code repository on GitHub.

I created and tested this Node.js application locally without IIS and deployed it as an Azure Web App without making any changes to the code at all. Node.js applications are hosted in Azure under IIS with iisnode. Iisnode is a native IIS module that allows hosting of node.js applications in IIS on Windows. Read more about iisnode here. Iisnode architecture also makes it significantly easier to take advantages of scalability afforded by Azure.

KeyNode is a work in progress. My plan is to use this as the basis for further explorations in the following areas :

  • DevOps, Docker and Microservices (at miniature scale of course!)
  • Create a Web UI with Express (a Node.js web application framework)
  • Integrate with Azure Storage/Queues
  • and more…

I invite you to check out the live site on Azure and fork it for your own experiments : KeyNode on GitHub.

Resources :

Photo Credit : Piano Keyboard (www.kpmalinowski.pl)

Using DbUpdater with MySql

DbUpdater can be used with mysql.

Here are the files you need to jumpstart the integration – DbUpdater-MySql.zip

1. You might need to change the path to mysql.exe in mysql-exec.bat.

2. Modify values in mysql-sample-command-line.bat.

Make sure mysql.data.dll is placed in GAC or in the same directory as DbUpdater.exe. Connector binaries can be downloaded from here –
http://dev.mysql.com/downloads/connector/net/6.1.html

Create Games From Scratch

My 18 month old son woke me up at 4 in the morning. I got him to go back to sleep, but … and here I am writing.

A few weeks ago, I downloaded Scratch from M.I.T’s website. Scratch is a programming language for kids. You drag/drop and (literally) snap together programming constructs to create games. Within a couple of minutes, I had a diver chasing a ball bouncing around on the screen.

I showed it to my 5 year old son and he watched as I “changed the game” and had the ball chase the mouse cursor and the diver chase the ball. He sat down with me and asked me if I could make it a dog chasing the ball instead. Sure! Then we started customizing further. Make it run faster .. Yeah! Let’s turn him into Clifford – The Big Red Dog … Can he bark ? Sure. Let’s turn it into a basket ball .. I want it red, blue, yellow, … Let’s make a basket ball hoop and have Clifford play basketball … Sure.

It was a quite productive pair programming session 🙂 . No TDD yet – he is too young for that kind of stuff 🙂 .

I was relieved that I didn’t loose his attention while I was “programming”. Now, my son does have a pretty long attention span for his age, but a lot of credit goes to the designers of Scratch. The entire “artwork” was created within the Scratch application and the “programming” was just a pure joy for my son. And we were just “scratching” the surface there. Explore the possibilities here.

You can download the game here : Clifford-ball.sb. Be careful, sometimes the dog turns upside down. There are some bugs to be fixed 🙂 . Where is the debugger ?

Using NHibernate With SQLite in DbUpdater

DbUpdater (version 1.3 onwards) can be used with sqlite3.

Review the files included in DbUpdater-sqlite3.zip

1. DbUpdater.exe.config must be modified. Note changes to the following keys – exe-file, exe-args, dialect, driver_class, connetion_string.
2. SchemaVersion.hbm.xml must be modified. The ‘class’ attributes value for the ‘generator’ of ‘id’ must be changed to ‘identity’. No other changes are needed here.
3. \SqlScripts\schema-version-table.sql is modified. Note that the datatypes of all columns are changed to TEXT, except for SchemaChangeId, which is changed to integer primary key. integer primary key is used for auto-incremented columns in sqlite.
4. \SqlScripts\baseline.sql is modified. Note the use of current_timestamp function. The actual sql syntax in this file must conform to sqlite dialect.
5. All other .sql scripts in \SqlScripts directory are also modified to conform to sqlite dialect.

The following files are expected to be in DbUpdater.exe directory (as configured in the DbUpdater.exe.config file) –

1. sqlite3.exe : download.
2. System.Data.SQLite.dll : download.
3. createdb.bat file : This file is included. This batch script will call sqlite3.exe to execute sql queries.

Another very useful tool for working with sqlite database files is SQLite Database Browzer.