Playing around with Codeium Tabby Dolly over the weekend I f Cerebral Valley #06-technical-discussion

Playing around with Codeium, Tabby, Dolly over the...

Vishwanath Seshagiri

04/10/2023, 12:55 PM

Playing around with Codeium, Tabby, Dolly over the weekend I feel that is the way a lot of folks would go in the future, if they have the time and resources to train networks. Many companies already have internal Analytics teams, even those that aren’t traditionally tech companies. They would be vary of sharing data with OpenAI or MS given that they don’t even use the Azure, AWS AI endpoints. It doesn’t take a lot to train the models since they’re built to be trained with minimal data. Plus, this would allow them to re-train the model based on internal tools and libraries, instead of using LLamaIndex or some other indexing service.

👀 4

Conner Swann

04/10/2023, 2:39 PM

I was going to dive into this later this week!

Conner Swann

04/10/2023, 2:39 PM

Any learnings worth sharing off the top around training?

Vishwanath Seshagiri

04/10/2023, 3:01 PM

For Dolly, I mostly used this tutorial: https://github.com/sinanuozdemir/oreilly-gpt-hands-on-nlg/blob/main/notebooks/Dolly_Lite.ipynb It was really succinct since the example is sort of contained within itself. Using a custom dataset with similar parameters would be a bit hard, and it took a lot of time to get the structure of data right, which is typically the problem in using models like this. One thing I noticed was that Dolly had to be continuously trained on new information (similar to how RLHF works in ChatGPT).

Vishwanath Seshagiri

04/10/2023, 3:02 PM

https://github.com/TabbyML/tabby Tabby was way more buggier than I imagined and definitely not out for consumption yet.

Conner Swann

04/10/2023, 3:35 PM

One thing I noticed was that Dolly had to be continuously trained on new information (similar to how RLHF works in ChatGPT).

Can you elaborate on how you discovered this?

Vishwanath Seshagiri

04/10/2023, 3:46 PM

Lets take the example that is present in the document:

Copy code

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Identify the odd one out.

### Input:
Twitter, Instagram, Telegram

### Response:
Telegram

Based on this, the odd one out is Telegram, but what if you added Slack to the mix?

Vishwanath Seshagiri

04/10/2023, 3:47 PM

One can construe that Slack is for work, but other 3 are social apps, but Dolly or any other LLM can't figure this out like what humans can

Vishwanath Seshagiri

04/10/2023, 3:47 PM

[This is obviously ignoring the corner case that Slack can be used for social reasons]

Vishwanath Seshagiri

04/10/2023, 3:49 PM

However, Dolly would still respond with Telegram because of the way it is trained and the underlying piece of data

Vishwanath Seshagiri

04/10/2023, 3:52 PM

With RLHF (that OpenAI uses), at 10000 feet, it adapts to the data that is being given as input, thus not requiring explicit re-training for specific things. For eg, it will still return that One Medical is a publicly traded company, even though it has been acquired by Amazon. This requires explicit retraining on new piece of information.

Conner Swann

04/10/2023, 3:52 PM

Got it makes sense

Conner Swann

04/10/2023, 3:53 PM

I am thinking about this for a classification use-case, and my understanding based on what you’re saying is that if the things we’re classifying on don’t necessarily change that often, streaming fine-tuning might not be a huge deal?

Vishwanath Seshagiri

04/10/2023, 3:54 PM

Yeah, it might not be, but as is the case with anything in LLM space, it depends on the nature of information that is being trained on.

Vishwanath Seshagiri

04/10/2023, 3:54 PM

Vishwanath Seshagiri

04/10/2023, 3:55 PM

The same information of "who owns one medical" returns a more accurate answer on Bard

Conner Swann

04/10/2023, 3:55 PM

Awesome this is neat exploration you did

Conner Swann

04/10/2023, 3:55 PM

thanks for sharing!!

Vishwanath Seshagiri

04/10/2023, 3:56 PM

With GPT-4 like models, one thing we cannot control is the underlying training data, which can have inaccuracies and biases when looking at it in context of your organization

Vishwanath Seshagiri

04/10/2023, 3:57 PM

However, Dolly has its use case when you completely trust that data sources that you have, which is true for any organization with a considerable size. That would cover almost all of the F 2000 companies, and maybe more.

Conner Swann

04/10/2023, 4:07 PM

As is my case indeed

okgodoit

04/11/2023, 2:23 AM

Today I learned that one medical was acquired by Amazon. Jesus, there is no way I’m comfortable with Amazon having all my medical data 😣

17 Views

Open in Slack

Previous Next