More CAPTCHA-related fun (bug 4513)
Aug. 17th, 2012 09:20 am![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
As some know, I've been working on the remaining bit of bug 4513 - that text CAPTCHAs always fail when non-logged-in OpenID users try to comment on a post. It's taken me somewhat further down the rabbit-hole than I expected ;-) I wanted to discuss two possible routes to fixing it - not sure whether such discussion is better here or on bugzilla?
So, background follows (and thanks to people on IRC last night, especially Dre, for helping me figure much of this out).
Initially I thought to fix this by getting the textcaptcha handler to look at the form it is given rather than doing its own thing. When I did this, the OpenID special case started working but everything else broke, some of the time ;-) After a lot of "WTF" I realised why Fu had originally written the textcaptcha handler the way she had:
Textcaptchas can have more than one correct answer (e.g. "7" and "seven"). When a form containing a captcha with two correct answers is received by a BML page and translated into the %POST variable, the hashes of the two correct answers are concatenated together, separated by a null character[1]. Apparently when the form is received by a TT page, something different will happen. Grabbing the form directly from the http request means that things will be in a consistent format regardless of BML/TT/whatever, so doing it that way avoided complexity. Except that it doesn't work with OpenID ;-)
So, as I see it there are two possible ways to fix:
Option 1 intuitively seems more elegant to me, simply because once we no longer support BML we can remove the relevant code from one place (the textcaptcha handler) and be left with consistent handling of all captchas site-wide.
Option 2 was generally favoured on IRC last night, and as I said there I'm quite happy to accept this advice from those with vastly more expertise and knowledge, so long as I understand the reason :-) However, note that with this approach comment forms will always remain a special case - even after BML is done away with - because of the need to allow for OpenID redirects.
Either way, I need advice:
If implementing Option 1, how does TT behave when it encounters a form with two fields of the same name? I would need to figure out how to tell the difference in what the textcaptcha handler receives so as to treat it accordingly. Are there any TT-based forms that use captchas as yet, to allow for testing?
If implementing Option 2, how can the textcaptcha handler know what page it is being called from? There is already a $page parameter that is used in most places (for instance, when a captcha is included on the lostinfo page, $page="lostinfo" is passed). However, when a captcha is included on the commenting form, $page=undef. I don't know whether I can just change this to something informative, or whether there is a reason for keeping it undef (possibly because in this instance the captcha is dealt with by a .pm file rather than directly in the BML)
So... comments, advice, and "WTF are you talking about, this doesn't make sense", all welcome. Thanks in advance :-)
PS: I realise that this is a tiny edge-case that I'm trying to fix, and that experienced people would probably have Just Fixed It. But, I want to get it right ;-)
EDIT: Just realised that the above is probably made even less comprehensible by the fact that I'm using "hash" in the md5 sense rather than the perl sense, most of the time. Words, they mean multiple different things. How inconvenient ;-)
[1] No, I didn't know what a null character was until yesterday...
So, background follows (and thanks to people on IRC last night, especially Dre, for helping me figure much of this out).
- When a form includes a captcha, and that captcha needs verifying to see whether the right answer is given, the code that is evaluating that form uses various functions in DW::Captcha, passing the form (as a hashref, I think?) in the parameters.
- The TextCAPTCHA handler ignores the form that it is passed, and simply looks up the form submission in the http request instead.
- This works fine except for one case - where a non-logged-in OpenID user is submitting a comment. In this case, because of the way that OpenID works, the http request that led to this code being run doesn't have the form in, so the correct answer to the captcha (included in the form, hashed in a hidden field) and the given answer are lost.
Initially I thought to fix this by getting the textcaptcha handler to look at the form it is given rather than doing its own thing. When I did this, the OpenID special case started working but everything else broke, some of the time ;-) After a lot of "WTF" I realised why Fu had originally written the textcaptcha handler the way she had:
Textcaptchas can have more than one correct answer (e.g. "7" and "seven"). When a form containing a captcha with two correct answers is received by a BML page and translated into the %POST variable, the hashes of the two correct answers are concatenated together, separated by a null character[1]. Apparently when the form is received by a TT page, something different will happen. Grabbing the form directly from the http request means that things will be in a consistent format regardless of BML/TT/whatever, so doing it that way avoided complexity. Except that it doesn't work with OpenID ;-)
So, as I see it there are two possible ways to fix:
- Get the textcaptcha handler to always pay attention to the form that it is passed, but to also check whether it is being passed a form by a BML page or by a TT page, and process the captcha's correct answers differently accordingly.
- Have the textcaptcha check whether it is being used in a comment submission (which could use OpenID) or not. If it is, have it use the form that it is passed and account for what BML does to the field; if it is not, behave as it does at present and grab the form from the http request.
Option 1 intuitively seems more elegant to me, simply because once we no longer support BML we can remove the relevant code from one place (the textcaptcha handler) and be left with consistent handling of all captchas site-wide.
Option 2 was generally favoured on IRC last night, and as I said there I'm quite happy to accept this advice from those with vastly more expertise and knowledge, so long as I understand the reason :-) However, note that with this approach comment forms will always remain a special case - even after BML is done away with - because of the need to allow for OpenID redirects.
Either way, I need advice:
If implementing Option 1, how does TT behave when it encounters a form with two fields of the same name? I would need to figure out how to tell the difference in what the textcaptcha handler receives so as to treat it accordingly. Are there any TT-based forms that use captchas as yet, to allow for testing?
If implementing Option 2, how can the textcaptcha handler know what page it is being called from? There is already a $page parameter that is used in most places (for instance, when a captcha is included on the lostinfo page, $page="lostinfo" is passed). However, when a captcha is included on the commenting form, $page=undef. I don't know whether I can just change this to something informative, or whether there is a reason for keeping it undef (possibly because in this instance the captcha is dealt with by a .pm file rather than directly in the BML)
So... comments, advice, and "WTF are you talking about, this doesn't make sense", all welcome. Thanks in advance :-)
PS: I realise that this is a tiny edge-case that I'm trying to fix, and that experienced people would probably have Just Fixed It. But, I want to get it right ;-)
EDIT: Just realised that the above is probably made even less comprehensible by the fact that I'm using "hash" in the md5 sense rather than the perl sense, most of the time. Words, they mean multiple different things. How inconvenient ;-)
[1] No, I didn't know what a null character was until yesterday...