Data analytics firm Databricks has launched a new open source AI similar to ChatGPT. It is a text-generating AI model that can be used to power chatbots and text summarisers, and even basic search engines.
Dolly 2.0 is the successor to the original Dolly launched in March, with the obvious difference being that it is now licensed to be used commercially by independent developers and companies.
Databricks cited a desire for a more ‘open and transparent’ large language model (LLM) in the AI market that allows companies to build, train and own AI-powered chatbots and other productivity apps using their own proprietary data sets.
How Dolly Became Dolly 2.0
The first Dolly used datasets that contain outputs from OpenAI, which went against OpenAI’s terms of service, so Databricks set about creating a new version.
Dolly 2.0 was trained by the creation of a set of 15,000 records generated voluntarily by thousands of Databricks employees. The set then guided Dolly to follow instructions in a chatbot-like fashion via the GPT-J-6B open source text-generating model, which was provided by the non-profit research group EleutherAI.
Dolly’s Community Collaboration
The CEO of Databricks, Ali Ghodsi, explained: “Dolly provides human-like language generation comparable to the LLMs that rely on vast amounts of data from the internet, but used on its own without further training, Dolly’s knowledge and accuracy is more limited.
“We’re committed to developing AI safely and responsibly and believe as an industry, we’re moving in the right direction by opening up models, like Dolly, for the community to collaborate on.”
Ghodsi also reiterated that he believed that making Dolly 2.0 open source was the best way forward. He added: “It gives researchers the ability to freely scrutinise the model architecture, helps address potential issues and democratises LLMs so that users aren’t dependent on costly proprietary large-scale LLMs.
“Organisations can own, operate and customise Dolly to their business.”
Limitations of Dolly 2.0
The limitations mentioned by Ghodsi are pretty much the same limitations as GPT-J-6B, which Dolly is based on. This manifests in the AI only generating text in English, and unfortunately has the potential to be both insensitive and offensive in its responses.
There also seems to be a lack of factual consistency, as demonstrated by certain questions receiving inaccurate or confusing responses.
In its defence, Ghodsi said that Dolly 2.0 is not intended to be the best model of its kind, and that it is better suited to simpler applications such as chatbots replying to customer support enquiries. It should also be able to extract information from legal documentation, as well as generate basic code from technical prompts.