← Back to context

Comment by kawera

10 years ago

Very good improvements, thanks.

I think the dupe detection would be even more useful if done during submission.

The dupe detection software, of course, does run during submission. Clicking on a search link is something that humans have to do, though. That's for catching duplicates that escape simple URL matching.

Writing software to identify which URLs are really about the same thing and which URLs are not is a nontrivial problem. I'd love to work on solving that in the general-enough case to be useful for HN, but we shouldn't let that stop us from doing incremental things to make life easier in the short run.

  • Possibly you could split submission into multiple steps? The user first submits just the URL. This returns the results of the dupe detector (if any). They then confirm submission (or resubmission) in the next step.

    The first submission could also test the URL, and pre-fill the title field with the actual title of the page. The user then edits the title (if desired), and confirms to submit the page. Or bails out when they realize the submission already exists under a duplicate URL.

    This could be done within a page with AJAX, or kept as it is with multiple pages. In either case, you'd leave the final choice of title and URL with the user, and are only offering them more information (and an easy option to cancel) during the process.

  • I suspect what kawera meant is if the submission form notified you that the URL is a dupe via ajax, so that then you wouldn't have to bother to copy and paste the title (which can be a pain on phones and tablets).

  • Please don't make the dupe detector "too perfect". Fact is good content sometimes doesn't get noticed first time it is submitted.

    It is common for re-submissions (sometimes with a better title) a few hours or days later to make the the top of the front page.

    • > good content sometimes doesn't get noticed first time it is submitted.

      Amid all the discussion about catching dupes I can see why this wasn't clear, but the point of these changes is to let more reposts through. That's the motivation for what we've released today (see "We've adjusted the dupe detector to reject fewer URLs" above). Every good story that doesn't get attention is a loss for HN.