Automatic Keyword Creation: Data Structure

Hello all. I am getting more excited as I began to get into this and about all the possibilities I could go with it. But for now I have to concentrate on the logic of building the keyword creation.

I would like to see other developers help me. But my experience has taught me that usually others don’t come on board until it is nearly finished. I have things I could say about that but of course I welcome and need the help.

I am debating whether or not to build this with showing the steps I have taken all along the way as kind of a joint venture project with those that are my friends on Live Space. The upside is it becomes a community project, and we all get to participate. The downside is many chefs spoil a broth. I think that is how the saying goes. But lets just play it as it goes and I will judge by your reaction whether or not to publicly discuss its development.

Anyway how it will work is it will take all words submitted except for those that are numbers or articles (a, an, the) and put them into a datatable called "RawWords". It will have the following columns:

WordID (the count of the word in the blog. The first word would be 0. It will be the primary key.)

WordRead (the word currently being read)

WordReadType (did the dictionary return it as a noun or what?)

WordReadCase (lowercase or uppercase. Easier to record then go back and forth trying to go back and forth in a string to determine)

PreviousWord (at times the type of usage is determined by the previous word. If the previous word was "with" and the current word is "Excel" we know Excel will be a noun)

PreviousWordType (did the dictionary return it as a noun or what?)

PreviousWordCase (if we have consecutive capitalized words it will be a proper noun and should be grouped together (e.g. Green Bay Packers). So if 3 consecutive rows are capitalized we will put them together and add it as a keyword.)

Then we will have a second table called "FilteredWords", we will apply rules to the data collected (see the next post) and go through the words shown and apply them. For example if it sees three rows with capital letters it will know it should be one word and enter it into the new table accordingly and record its word type as a proper noun.

Our keywords will be the nouns and proper nouns generated.

So that’s about it. Please comment on this and the next post as it will be important to how this is written. I am not an English major and could use all the help I can get.




Technorati Tags: ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Windows Live Tags: Automatic,Keyword,Creation,Data,Structure,possibilities,logic,steps,friends,Live,Space,upside,downside,broth,reaction,development,words,numbers,RawWords,WordID,word,WordRead,WordReadType,dictionary,WordReadCase,Easier,PreviousWord,times,usage,Excel,PreviousWordType,PreviousWordCase,Green,Packers,FilteredWords,rules,example,capital,English,article,column,noun

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: