When you are using Claude 3 or ChatGPT's API's , you should get used to counting tokens and having both wrapping and analyzing code around the API. Luckily, it's easy !
For python :
import tiktoken
def num_tokens_from_model(string: str, encoding_name: str) -> int:
"""Returns the number of tokens in a text string."""
encoding = tiktoken.encoding_for_model(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
tokens = num_tokens_from_model("Hello World", "gpt-4o")
print(tokens)
This was interesting because when I asked Claude 3 opus how to do it, it lied to me ! It said I could use tiktoken, then lied multiple times!
Overall, I've had much better results coding on GPT-4 than Claude, with the exception of java and c#.
I wonder why that is?
Update at the time of writing: It seems that Claude 3 has no public tokenizer, so use GPT-4 tokenizer as a loose approximation or you can try the following: https://github.com/javirandor/anthropic-tokenizer
Just another reason to avoid wasting your time, and stick to OpenAI over Claude (Anthropic), even on AWS bedrock.