Since people have been making efforts to structure our knowledge for decades, and the resulting data is typically given out for free, it makes sense to use these to teach Yoko - once all the 'store piece of knowledge' methods are implemented that is. (even though I like the prospect of 'raising' her myself too, if only to learn from how that process goes! But that could get boring after a while...)
Online knowledge bases / onthologies
Actually, the Wikipedia page on commonsense knowledge pretty much says it all...
Data provided for download by Princeton university. It's in annoyingly obscure format though, especially to figure out the onthology relations, but this 'princeton evocation project' has some more useful stuff, including download links for the 'core' top 1000 and 5000 synsets. Useful juice for synonyms!
This seems like the oldest 'knowledge database' made for AI purposes? wiki and OpenCYC project page.
Original creator died, but this Google Code site still has some downloads with data. Nothing 'raw' though it seems, though the executables can generate .txt files. Apparently the 'GAC-80K' is the most interesting database related to this, containing 80K simple statements of facts ('the skye is blue'). The files for getting the data all seem to be aimed at Linux (.deb files) though, but should be possible to run that on OSX.
Never-ending language learning
Data provided for download by Carnegie Mellon University.
ConceptNet by MIT
Found here, and seems interesting! Also I should get more familiar with that 'hypergraph' thingie some knowledge rep systems keep going on about (like this one but also OpenCog for example). Especially interesting because it puts square in my face how poor my attention to the existence of diferent relations between stuff has been so far. There's more to the world than 'is-a' and 'has-a' and possible actions, Wouter, wake up!
In particular their types of relations seem very interesting. To be revisited in the giving-Yoko-craploads-of-knowledge phrase.
VerbNet seems to be to my 'actions and events' stuff what wordnet is to my 'classes and instances' stuff. Sounds interesting!
Yet another nice 'knowledge base', this time based on wikipedia! DBPedia.
... aaaand framenet, which seems to have even more structured info on actions and events.
Long-term goal: once Yoko has proper code for 'handling new things learned' in place, just feed her all of the above datasets and watch her dominate the world.
YAGO / YAGO2
Awesome, YAGO is a knowledge base that seems to be the biggest so far, and that also did the work of extracting from other knowledge bases like WordNet.
LISTS OF COMMON WORDS
All of the above have something in common: they are too friggin BIG. Where are the ontologies that start with 5000 most common nouns in English, and give for each one parent class and one property and value or something? For inspiration, I should parse / copy these guys:
- Apperently some old chatbot platform 'verbots' has some downloadable knowledge bases that I should perhaps get while I still can?
- Introduction to Natural Language Processing - title says it all, and it's thorough, easy to read, and awesome.
- SO question PHP POS tagger
- PHP implementation of the Brill part of speech (POS) tagger (also, I should check out that 'Brown corpus' mentioned)
- PHP NLP tools - in particular the 'greek POS tagging' stuff, in English that could be what we need!
- Corpusses from the above PHP NLP tools - contains some fun knowledge as well, like the 'chat80' thingie linked gets some nice geography info
- Interesting example text on nouns in English
- Academic stuff on Adjectives, with a neat classification on them among others
Online chatbots and contests
- ProtoThinker, the program by far closest to what Yoko aspires to be I have run into yet.
- 20 Loebner entry questions
- DIVABOT contest, 25 questions for Pandora bots before they can join the Loebner prize. How would Yoko do on these types of questions?
- Robo chat challenge, inspiration for types of conversations/questions.
- Jeeney AI, a bot that doesn't seem to solely rely on AIML stuff
- Mitsuku, another nice bot that tries to have 'knowledge' and 'memory' beyond just language.
- Beyond facade pattern matching - Bruce Wilcox (on different chatbot techniques - they still all seem to focus almost exclusively on the pattern matching without 'knowledge' though)
- About the Suzette chatbot still by Bruce Wilcox, nice review of limitations on current chatbots.
- Winograd Schemas, a nice alternative to the Turing Test for checking on how well a computer 'understands' the world.
- The AIML tag reference table - it's not all just mindless pattern-response matching!
- The state of chatbots Interesting series of blog posts, especially for the many references!!! There is lots of current research out there, and this article might make it easier to get my hands on it.
- ELIZA, ALICE, pandorabots
- jabberwacky, cleverbot)
- SHRDLU (check out that readme file!)
- TUTOR (check out the phd thesis where Daniel Bobrow describes the latter, it's a pretty cool read)
General interesting AI articles / papersOn generating jokes
- The 'soundex' concept, to generate/search/... stuff by how it sounds! Useful for generating jokes
- The Joking Computer! Somebody thought long and hard already on mechanisms for AI funniness. This is particularly interesting, the different kinds of jokes they make.
- Ontology relations on a gene ontology
- Sentence patterns in English - apparently, by some division, there's not too many of them - I like that!
- OpenCog, cool, a modern-day ambitious and broadly-scoped AI project, with chatting nicely on the agenda! How I think about Yoko seems to match a bit what they describe as their 'artificial toddler'? In any case, they have awesome talking pets!
- The AIML source files. Has lots of files (in particular those named mp*.aiml it seems) with questions with answer pattern YES, so obviously we can learn from these from the world?
- Oh wait, even more (and better structured?) AIML source files - all links to files seem to be broken though :(
- Zoe: Evolution of an AIML Chatbot - Alicebot - great overview of many typical issues when building a chatbot
- Woah: Project CyN: uniting CyC with AIML - how is this thing not better-known? If a 'sounds as smart as a human' chatbot is gonna come from some direction my bet would be on this.
- From Semantic Web Knowledge to A Functional Conversational Agent: a practical Approach
- Enhancing intelligent agents with episodic memory. Sounds interesting!
- Beyond AIML by Bruce Wilcox, creator of the chatscript language (which I should study more, in particular the ^createfact, table stuff.
- Open world assumption (OWA) versus closed world assumption(CWA) (do we reply 'no' or 'I dont know' to questions our db has nothing about?)
- Objects and compliments (relevant for how we store/define types of actions)
- On Emotions: the 'PAD model' where each emotional state is represented as a point in a 3 dimensional space, with the 3 dimensions being Pleasure, Arousal, Dominance. Sounds interesting?
- List of notable AI projects
- AI meets NS - fun and famous essay on 'Artificial Intelligence meets Natural Stupidity'
- Interesting article on the Loebner Prize (first edition)
- Good sum up of what a 'good' chat bot should do
- The New Yorker - why cant my computer understand me
- HN Discussion on AI (direct link to paper.Another paper from that same Levesque dude - I like him, he makes up nice chatbot tests: Winograd Schema challenge. Challenge accepted!)
- John McCarthey's Advice Taker (wikipedia article contains link to the article). Seems like another old paper describing the kind of stuff I'm aiming for. Knowledge represented by symbols with reasoning on those symbols.
- the history of AI debate on 7 huge infographic / maps!
- Importing big SQL files
More Wikipedia AI juice and my sloppy tidbit to remember what it was all about again:
- Ontology components ... Turns out all I came up with in about>worldview is pretty much common knowledge.
- Non-monotonic logic (how to revise information when running into those darn penguins)
- Symbol grounding (what is the meaning of all those words we are tossing around)
- Chinese room (famous argument about some trivial program (i.e. a mega lookup table of responses) not being 'true' thinking)
- Good Old Fashioned AI (GOFAI)
- The Turing Test (is a good test of being intelligent that you can seem it in a conversation?)
- you could also say Yoko is a 'rule-based system.