Pre-trained language models like ChatGPT have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks. Moreover, in bioinformatics, generating functional programs poses additional notable challenges due to the amount of domain knowledge, the need for complicated data operations, and intricate functional dependencies between the operations. Here, we present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates 1026 functions and 1243 methods in Python and Java from GitHub and 253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing framework for evaluation, and we have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes the importance of domain knowledge, pragmatic code generation, and contextual understanding. Our dataset, benchmark, Docker images, and scripts required for testing are all available at this https URL.
Here are more prompts, you can copy them and run the models on the playground of OpenAI.
Rank | Model | Details | Pass@1 | Pass@5 | Pass@10 | Pass@20 |
---|---|---|---|---|---|---|
1 Mar 14, 2023 |
gpt-4
Azure OpenAI |
Completion
t=0.7, top-p=0.95 len = 8192 |
38.439 | 48.491 | 50.619 | 52.229 |
2 Mar 01, 2023 |
gpt-3.5-turbo
Azure OpenAI |
Completion
t=0.7, top-p=0.95 len = 8192 |
24.682 | 33.997 | 37.132 | 40.127 |
3 May 09, 2023 |
Starcoder
Bigcode Li et al., '23 |
Completion
t=0.7, top-p=0.95 len = 8192 |
4.682 | 15.225 | 21.200 | 27.166 |
4 Dec 22, 2022 |
SantaCoder
Bigcode Allal et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
2.965 | 9.848 | 14.227 | 18.181 |
5 Nov 08, 2022 |
InCoder-6B
Facebook AI Fried et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
1.688 | 5.320 | 8.332 | 12.006 |
6 May 03, 2023 |
CodeGen2-7B
Salesforce Research Nijkamp et al., '23 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0.860 | 2.494 | 3.962 | 6.242 |
7 Nov 08, 2022 |
CodeGen-6B
Salesforce Research Nijkamp et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0.637 | 0.637 | 0.637 | 0.637 |
8 May 15, 2023 |
InstructCodeT5+ 16B
Salesforce Research Wang et al., '23 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0 | 0 | 0 | 0 |
Rank | Model | Details | Pass@1 | Pass@5 | Pass@10 | Pass@20 |
---|---|---|---|---|---|---|
1 Mar 14, 2023 |
gpt-4
Azure OpenAI |
Completion
t=0.7, top-p=0.95 len = 8192 |
45.011 | 55.350 | 57.616 | 60.000 |
2 Mar 01, 2023 |
gpt-3.5-turbo
Azure OpenAI |
Completion
t=0.7, top-p=0.95 len = 8192 |
17.400 | 33.199 | 37.878 | 42.000 |
3 May 09, 2023 |
Starcoder+
Bigcode Li et al., '23 |
Completion
t=0.7, top-p=0.95 len = 8192 |
1.300 | 5.031 | 8.042 | 12.000 |
4 May 09, 2023 |
StarCoder
Bigcode Li et al., '23 |
Completion
t=0.7, top-p=0.95 len = 8192 |
0 | 0 | 0 | 0 |
5 Dec 22, 2022 |
SantaCoder
Bigcode Allal et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0 | 0 | 0 | 0 |
6 Nov 08, 2022 |
InCoder-6B
Facebook AI Fried et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0 | 0 | 0 | 0 |
7 May 03, 2023 |
CodeGen2-7B
Salesforce Research Nijkamp et al., '23 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0 | 0 | 0 | 0 |
8 Nov 08, 2022 |
CodeGen-6B
Salesforce Research Nijkamp et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0 | 0 | 0 | 0 |
9 May 15, 2023 |
InstructCodeT5+ 16B
Salesforce Research Wang et al., '23 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0 | 0 | 0 | 0 |
Rank | Model | Details | Pass@1 | Pass@5 | Pass@10 | Pass@20 |
---|---|---|---|---|---|---|
1 Mar 14, 2023 |
gpt-4
Azure OpenAI |
Completion
t=0.7, top-p=0.95 len = 8192 |
24.308 | 39.551 | 44.864 | 50.198 |
2 Mar 01, 2023 |
gpt-3.5-turbo
Azure OpenAI |
Completion
t=0.7, top-p=0.95 len = 8192 |
23.671 | 31.953 | 36.702 | 40.725 |
3 May 09, 2023 |
Starcoder
Bigcode Li et al., '23 |
Completion
t=0.7, top-p=0.95 len = 8192 |
0.534 | 2.042 | 3.228 | 4.743 |
4 Nov 08, 2022 |
CodeGen-6B
Salesforce Research Nijkamp et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0.692 | 2.088 | 3.055 | 3.953 |
5 May 09, 2023 |
Starcoder+
Bigcode Li et al., '23 |
Completion
t=0.7, top-p=0.95 len = 8192 |
0.356 | 1.313 | 1.978 | 2.767 |
6 Dec 22, 2022 |
SantaCoder
Salesforce Research Allal et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0.158 | 0.658 | 1.075 | 1.581 |
7 May 03, 2023 |
CodeGen2-7B
Salesforce Research Nijkamp et al., '23 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0.059 | 0.296 | 0.593 | 1.186 |
8 May 15, 2023 |
InstructCodeT5+ 16B
Salesforce Research Wang et al., '23 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0.059 | 0.296 | 0.593 | 1.186 |
9 Nov 08, 2022 |
InCoder-6B
Facebook AI Fried et al., '22 |
Completion
t=0.7, top-p=0.95 len = 2048 |
0.020 | 0.099 | 0.198 | 0.395 |