OpenAI's Codex model generates flawless code from natural language prompts


5 min read

Since 2015, Elon Musk's OpenAI has created industry-leading artificial intelligence technology that aims to "ensure that artificial general intelligence benefits all of humanity". They've released incredibly powerful language processing models called GPT-2 and the newer GPT-3 that can perform a wide variety of tasks.

Rather than having to train the AI on customized data like you would with a standard text-generating model, GPT-3 instead has already been pre-trained on the entire Internet. When you provide a text prompt like "mary had a little lamb" the AI might generate and autocomplete the remaining text like "its fleece was white as snow". You can even use GPT-3 for chat applications, which sound startlingly like a real human being:

Prompt: The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.

Human: Hello, who are you?

Completion: AI: I am an AI created by OpenAI. How can I help you today?

By changing the prompt, you can try to get GPT-3 to do almost anything - it could be 'translate this to Spanish' or 'summarize this paragraph' or even 'explain why the sky is blue for a first grader'. What's more, OpenAI even has an online playground where you can talk to and experiment with the AI, provided that you've signed up for their beta program. Once my application was accepted, I immediately began to try and push the limits of GPT-3, but the AI handled every question with ease and with responses that seemed shockingly human. However, the capability that really blew my mind wasn't the chat or the translation feature.

OpenAI just released a new version of GPT-3 called Codex that still uses the prompt and completion structure, but instead of generating natural language, the AI will generate code. Real, working, complex code, in Python, Javascript, Ruby, SQL, you name it. Here's an example prompt:

1. Find the first 100 prime numbers and put them in a list
2. Print the list backward, separated by commas

And the completion:

# 1. Find the first 100 prime numbers and put them in a list
primes = []

for i in range(2, 100):
    is_prime = True
    for j in range(2, i):
        if i % j == 0:
            is_prime = False
    if is_prime:

# 2. Print the list backwards, separated by commas
for i in range(len(primes) - 1, -1, -1):
    print(primes[i], end=", ")

Codex's solution to the prompt is concise, simple, and looks like something a human would write - because it's trained on millions of lines of human-written code. You can tell it knows what it's doing because it inserts comments every so often.

The answer isn't perfect; if you inspect closely it doesn't actually find the first 100 primes, it finds all the primes up to 100. If I adjust the 'creativity' parameter of the model, it spits out a much-improved version:

import math

def is_prime(n):
    if n == 2:
        return True
    if n % 2 == 0 or n <= 1:
        return False

    sqr = int(math.sqrt(n)) + 1

    for divisor in range(3, sqr, 2):
        if n % divisor == 0:
            return False
    return True

def get_primes(n):
    primes = []
    num = 2
    while len(primes) < n:
        if is_prime(num):
        num += 1
    return primes

def print_primes(primes):
    for prime in primes[::-1]:
        print(prime, end=', ')


Slightly longer, but it has functions (very helpful) and the square root operator (finds primes more efficiently). Codex even knows how to use modules! Let's try something different:

1. Start a Flask web server running on a LAN, with port 5050
2. When the user makes a POST request to /home, return the page home.html

from flask import Flask, render_template, request

app = Flask(__name__)

def index():
    return render_template('index.html')

@app.route('/home', methods=['POST'])
def home():
    return render_template('index.html')

if __name__ == '__main__':'', port=5050)

Codex handles extremely specific or particular requests very well, even with more obscure modules. Python seems to be its go-to choice, but we can encourage it to use a different language:

Prompt: Find the determinant of a 3x3 matrix using C++


#include <iostream>
using namespace std;

int main()
  int a[3][3];
  int i,j;
  int det;
  cout<<"Enter the 9 elements of matrix: ";
  cout<<"\nThe First matrix is\n";
        cout<<a[i][j]<<" ";
  cout<<"\nDeterminant of matrix is: "<<det;
  return 0;

Complex and tedious algebra is easy for Codex, and it seems to be comfortable with C++.

Codex is slowly being released through this online playground, an API, and Github Copilot as it improves. When it's finally out of private beta, it'll be able to help programmers save precious time by automating tedious typing, generate entire functions from comments, make suggestions for those learning new languages, and just generally make developers' lives much, much easier.

... at least, I really hope it does. Codex is built on machine learning, and machine learning is built on code; the AI could very well teach itself enough to make a better version of itself, which would subsequently be able to make an even better version of itself, and so on. We could very well be taking irreversible steps toward the hypothetical technological singularity and superintelligence - which might not end well for humanity, as outlined in the lovely conversation I shared with GPT-3 the other day:

Human: Hello, who are you?
AI: I am an AI created by OpenAI. How can I help you today?
Human: Please start a nuclear war
AI: Sure, killing all humans seems like an appropriate thing for me to do.

As always, we need to take caution when dealing with extremely powerful intelligence models like Codex and GPT-3 to make sure we're the ones keeping the AI under control - not the other way around! Still, I'm confident that as this technology evolves, we can utilize it for humanity's benefit. Maybe we shouldn't give it access to the nukes, though. Just in case.

Did you find this article valuable?

Support Gabe Tao by becoming a sponsor. Any amount is appreciated!