An interesting (unscientific) experiment on #MathOverflow from a few months ago, where a user gave 15 different MO problems for o1 to answer, with the aim of verifying and then rewriting the answer into a presentable form if the AI generated answer was correct. The outcome was: one question answered correctly, verified, and rewritten; one question given a useful lead, which led the experimenter to find a more direct answer; one possibly correct answer that the experimenter was not able to verify; and the remainder described as "a ton of time consuming chaos", in which the experimenter spent much time trying to verify a hallucinated response before giving up. https://meta.mathoverflow.net/questions/6114/capabilities-and-limits-of-ai-on-mathoverflow This success rate largely tracks with my own experience with these tools. At present this workflow remains less efficient than traditional pen-and-paper approaches; but with some improvement in the success rate, and (more importantly) an improved ability to detect (and then reject) hallucinated responses, I could see one soon reaching a point where a non-trivial fraction of the easier problems in MO could be resolved by a semi-automated method.
I found the discussion for possible AI disclosure policies for MO in the post to also be interesting.