Comment by samsk

10 months ago

Nice ! I've a scrapper using XPath/XSLT extensively and 90% of the XPath selectors work like for years without a change. With CSS selectors I've had more problems...

7 comments

samsk

ebruchez 10 months ago

CSS selectors have spent the last few decades reinventing XPath. XPath introduced right from the beginning the notion of axes, which allow you to navigate down, up, preceding, following, etc. as makes sense. XPath also always had predicates, even in version 1.0. CSS just recently started supporting :has() and :is(), in particular. Eventually, CSS selectors will match XPath's query abilities, although with worse syntax.

samsk 10 months ago

The problem with CSS selectors (at least in scrapers) is also that they change relatively often, compared to (html) document structure, thats why XPath last longer. But you are right, CSS selectors compared to 20 years old XPath are realy worse.
masklinn 10 months ago
On the other hand:
- XPath literally didn't exist when CSS selectors were introduced
- XPath's flexibility makes it a lot more challenging to implement efficiently, even more so when there are thousands of rules which need to be dynamically reevaluated at each document update
- XPath is lacking conveniences dedicated to HTML semantics, and handrolling them in xpath 1.0 was absolutely heinous (go try and implement a class predicate in xpath 1.0 without extensions)
- mdaniel 10 months ago
  
  > - XPath literally didn't exist when CSS selectors were introduced
  [citation required]
  https://www.w3.org/TR/1999/REC-xpath-19991116/
  https://www.w3.org/TR/REC-CSS1-961217
  > W3C Recommendation 17 Dec 1996, revised 11 Jan 1999
  There are various drafts and statuses, so it's always open to hair-splitting but based only on the publication date CSS does appear to win
bambax 10 months ago
> CSS selectors have spent the last few decades reinventing XPath
YES! This is so true! And ridiculous! It's a mystery why we didn't simply reuse XPath for selectors... it's all in there!!
- masklinn 10 months ago
  
  > It's a mystery why we didn't simply reuse XPath for selectors... it's all in there!!
  It's not really a mystery:
  > CSS was first proposed by Håkon Wium Lie on 10 October 1994. [...] discussions on public mailing lists and inside World Wide Web Consortium resulted in the first W3C CSS Recommendation (CSS1) being released in 1996
  > XPath 1.0 was published in 1999
  CSS2 was released before XPath 1.0.
  
  1 reply →