- By quade
- 23 April 2024
Meta Coding AI Tested Against Other Programs
In recent months, the AI coding landscape has witnessed a remarkable surge in innovations as new AI models have emerged. Major tech entities are pioneering their homegrown coding models, including Meta Coding AI.
However, amidst these developments, there also come new challenges. With new AI models, users like BPO services, IT support, and other tech companies now struggle to figure out the best models for their needs. To help with this, tests are now being conducted to determine the best models. Recently, ZDNet has conducted a test for Meta Coding AI to see how well it works against established models.
How The Meta Coding AI Test Works?
Over the past year, ZDNet has undertaken a series of tests to evaluate the performance of large language models in handling various coding tasks. They will select an AI model which will be tested against well-known LLMs. They will only be tasked with handling simple tasks if they struggle to handle these, they will struggle to handle complex ones.
Conversely, successful performance could demonstrate their values and how they could help coders in the future. If they prove to be successful, then the AI models can undergo another round of testing. With Meta Coding AI, it was tested against Llama, Gemini, and ChatGPT.
Testing Results for Meta Coding AI
In a bid to gauge the efficacy of various AI models in handling fundamental programming tasks, ZDNet recently subjected Meta Coding AI, alongside other contenders, to a rigorous evaluation. They created four challenges: crafting a WordPress plugin, rewriting a string function, finding a bug, and creating a script. The ensuing analysis aimed to discern the models’ relative strengths and weaknesses, providing insights into their suitability for real-world programming endeavors.
Writing a WordPress Plugin
Of the models, only Meta Code Llama was unable to generate a proper interface. The other three got through this step without a problem, though Chat GPT’s output stood out for its cleanliness and organization, featuring clear headings for each field. Additionally, ChatGPT placed the Randomize button more intuitively, enhancing user experience.
Functionality-wise, ChatGPT was the only one to create a functional plugin as Meta AI encountered issues, displaying a flash before presenting a white screen—a phenomenon commonly dubbed “The White Screen of Death” in the WordPress community. This meant that its plugin was completely unusable despite its initial appearance.
This failure is significant within the IT support, BPO web development, and programming community because WordPress is a widely used service. If Meta Coding AI is to become a major player, they will need to overcome this issue.
Rewriting A String Function
This test is designed to test dollars and cents conversions and the problems started to show. Meta AI had four main problems: it made changes to correct values when it shouldn’t have, didn’t properly test for numbers with multiple decimal points, and completely failed if a dollar amount had less than two decimals (in other words, it would fail with $5 or $5.2 as inputs), and rejected correct numbers once processing was completed because it formatted those numbers incorrectly. This is a fairly simple assignment and one that most first-year computer science students should be able to complete.
Google Gemmini also failed in this test, being the only other model that failed. Meanwhile, ChatGPT and Meta Code Llama both succeeded without any problems.
Finding A Bug
In this test scenario, the AI models were tasked with diagnosing and rectifying issues within pre-existing code segments, supplemented with error data and problem descriptions. The challenge lay in finding the bad code and fixing it without affecting the rest of the system. This requires a strong understanding of the WordPress API and intricate program dynamics.
While most models faltered, Meta Coding AI emerged as the standout performer. The only other model that succeeded in this test was ChatGPT. Not only did it accurately identify the error, but it also proposed an efficiency-enhancing solution, which is far beyond what the other models are capable of. This is great news for BPO and IT Services like geniusOS as the bulk of our work involves coding.
Surprisingly, Meta AI’s success contrasted sharply with its earlier failure in a seemingly simpler task of rewriting a string function. This underscores the variability in AI chatbots’ performance, emphasizing the importance of context and task-specific proficiency.
Writing A Short Script
In this assessment, ZDNet used the MacOS scripting tool Keyboard Maestro, AppleScript, and Chrome scripting behavior was imperative. Keyboard Maestro, despite its remarkable capabilities, remains relatively niche, developed by a solitary programmer in Australia. Mastery of this tool implies a broad coding acumen across languages. AppleScript, native to MacOS, also poses a challenge due to its limited visibility.
Unfortunately, Meta Coding AI and Meta’s Code Llama both faltered since they failed to extract data from Keyboard Maestro as instructed, displaying no awareness of the tool. Conversely, Gemini and ChatGPT accurately identified it as a distinct tool and successfully retrieved the data, indicating ChatGPT’s superior performance.
Testing AI models like this is important for our geniusOS team and other BPO services as it helps determine which models can provide the best Offshore Outsourcing Solutions. We make sure to keep a close watch on the different models to find the ones we can add to our tools. If you want to learn more about what software we use, you can reach out here.