Using API's - Counting the tokens!

Written by Noel Victor | May 27, 2024 4:26:55 PM

When you are using Claude 3 or ChatGPT's API's , you should get used to counting tokens and having both wrapping and analyzing code around the API. Luckily, it's easy !

For python :

import tiktoken

def num_tokens_from_model(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.encoding_for_model(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

tokens = num_tokens_from_model("Hello World", "gpt-4o")

print(tokens)

This was interesting because when I asked Claude 3 opus how to do it, it lied to me ! It said I could use tiktoken, then lied multiple times!

Overall, I've had much better results coding on GPT-4 than Claude, with the exception of java and c#.

I wonder why that is?

Update at the time of writing: It seems that Claude 3 has no public tokenizer, so use GPT-4 tokenizer as a loose approximation or you can try the following: https://github.com/javirandor/anthropic-tokenizer

Just another reason to avoid wasting your time, and stick to OpenAI over Claude (Anthropic), even on AWS bedrock.

View full post