Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

From Wikipedia, the free encyclopedia

GPT-J
Developer(s)EleutherAI
Initial releaseJune 9, 2021; 3 years ago (2021-06-09)
Type
LicenseOpen-source
Website6b.eleuther.ai Edit this on Wikidata

GPT-J or GPT-J-6B is an open-source large language model (LLM) developed by EleutherAI in 2021.[1] As the name suggests, it is a generative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.[2]

YouTube Encyclopedic

  • 1/5
    Views:
    3 734
    23 113
    494 939
    744 360
    406
  • Quick Demo GPT-3 VS GPT-J
  • How to run Large AI Models from Hugging Face on Single GPU without OOM
  • UNCENSORED GPT4 x Alpaca Beats GPT 4! Create ANY Character!
  • I Asked ChatGPT To Make Me As Much Money As Possible
  • Chat GPT j opened a massive Door for you. #chatgpt

Transcription

Architecture

GPT-J is a GPT-3-like model with 6 billion parameters.[3] Like GPT-3, it is an autoregressive, decoder-only transformer model designed to solve natural language processing (NLP) tasks by predicting how a piece of text will continue.[1]

Its architecture differs from GPT-3 in three main ways.[1]

  • The attention and feedforward neural network were computed in parallel during training, allowing for greater efficiency.
  • The GPT-J model uses rotary position embeddings, which has been found to be a superior method of injecting positional information into transformers.[4][5]
  • GPT-J uses dense attention instead of efficient sparse attention, as used in GPT-3.

Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257 tokens, the same size as GPT-2's.[2] It has a context window[broken anchor] size of 2048 tokens.[6]

It was trained on the Pile dataset,[2][3] using the Mesh Transformer JAX library in JAX to handle the parallelization scheme.[2][7]

Performance

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without first fine-tuning the model for a specific task.[2] Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French).[8]

When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks.[3] It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks.[9] With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks.[1]

Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.[2]

Applications

The untuned GPT-J is available on EleutherAI's website,[10] NVIDIA's Triton Inference Server,[11] and NLP Cloud's website.[12] Cerebras[1] and Amazon Web Services[13][14] offer services to fine-tune the GPT-J model for company-specific tasks. Graphcore offers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced.[15] CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants.[16][17]

In March 2023, Databricks released Dolly, an Apache-licensed, instruction-following model created by fine-tuning GPT-J on the Stanford Alpaca dataset.[18] NovelAI's Sigurd[19] and Genji-JP 6B[20] models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models.[21]

EleutherAI has received praise from Cerebras,[1] GPT-3 Demo,[3] NLP Cloud,[12] and Databricks[18] for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use.[9][15][22]

References

  1. ^ a b c d e f Vassilieva, Natalia (22 June 2022). "Cerebras Makes It Easy to Harness the Predictive Power of GPT-J". Cerebras. Retrieved 14 June 2023.
  2. ^ a b c d e f "GPT-J 6B". Hugging Face. Retrieved 13 June 2023.
  3. ^ a b c d "GPT-J". GPT-3 Demo. Retrieved 13 June 2023.
  4. ^ Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021). "Rotary Embeddings: A Relative Revolution". EleutherAI. Retrieved 14 June 2023. In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.
  5. ^ Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding". arXiv:2104.09864 [cs.CL].
  6. ^ "GPT-J". GitHub. Hugging Face. Retrieved 23 June 2023.
  7. ^ Wang, Ben; Komatsuzaki, Aran (May 2021). "Mesh Transformer JAX". GitHub. Retrieved 13 June 2023.
  8. ^ Forefront (14 October 2021). "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". Medium. Forefront. Retrieved 13 June 2023.
  9. ^ a b "GPT-J Reviews". Slashdot. Retrieved 23 June 2023.
  10. ^ "Test the EAI models". EleutherAI. 2021. Retrieved 30 June 2023.
  11. ^ Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022). "Deploying GPT-J and T5 with NVIDIA Triton Inference Server". NVIDIA. Retrieved 30 June 2023.
  12. ^ a b Vettier, Pauline (16 September 2021). "NLP Cloud now supports GPT-J, the open-source GPT-3 alternative" (Press release). Grenoble, France: NLP Cloud. Retrieved 30 June 2023.
  13. ^ Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023). "Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library". Amazon Web Services. Retrieved 30 June 2023.
  14. ^ Schmid, Philipp (11 January 2022). "Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker". Hugging Face. Retrieved 30 June 2023.
  15. ^ a b Liguori, Sofia (9 June 2023). "Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks". Graphcore. Retrieved 23 June 2023.
  16. ^ "GPT-J-6B". CoreWeave. 23 June 2023. Retrieved 30 June 2023.
  17. ^ Hjelm, Max. "CoreWeave Powers a World of Possibility with GPT-J". CoreWeave. Retrieved 30 June 2023.
  18. ^ a b Conover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023). "Hello Dolly: Democratizing the magic of ChatGPT with open models". Databricks. Retrieved 18 June 2023.
  19. ^ NovelAI (9 May 2022). "The faces of NovelAI's AI Models: Part 1". Medium. Retrieved 1 July 2023.
  20. ^ NovelAI (3 November 2021). "Data Efficient Language Transfer with GPT-J". Medium. Retrieved 1 July 2023.
  21. ^ NovelAI (29 July 2021). "Introducing Custom AI Modules". Medium. Retrieved 1 July 2023.
  22. ^ Shiraly, Karthik (26 February 2023). "See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks". Width.ai. Retrieved 23 June 2023.
This page was last edited on 11 June 2024, at 02:56
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.