This project has moved. For the latest updates, please go here.

Whishes : more Drives / Write-Flag / Folder distribution / Parity

Oct 5, 2010 at 8:48 AM
Edited Oct 5, 2010 at 9:07 AM

Hello, my whishes / comments ...



1.

More than 1 Drive Letter  ....


2.

In caes of folder distribution for now the simplest and moste flexible way to manage is to add an option to set per SourceLocations a Write-Flag.

Liquesce can then go thru the list of sources from Top to Bottom and find the first writable drive, or return an error if none is writable > ReadOnly-MODE or all out of space.

The Freespace is calculated consisting of only the writable drives.


3.

Another Question is the meaning of FOLDER in the case of distribution in case of out of space .....

Sorry if i overseen it, but i nowhere found what you exactly mean by folder ....

Folder can mean ONE (from sight of every file) Folder or a whole Folder Structure ....

In most cases i would wish to keep a whole Folder structure together.


There are 2 Solutions ... if we have a fixed Folderstructure (in most Media cases) ,  we can define a deepth from where the folders have to keepd together ...

If we have MP3 and they are like

\GENRE\INTERPRET\ALBUM\song.mp3

\GENRE\INTERPRET\ALBUM\COVER\coverscan1.jpg

then we can say

depth 0 to keep all GENRE together, depth 1 to keep all Songs of same Interpret together or deepth 2 to keepn only Albums together ...


If we use an easy per folder sight at the example the cover can possibly seperated from the MP3s.

If we say everything in the Root of the SourceLocation has to keep togheter than than that is a bad idea too ... in case of Genre Folders.


Or the Foldername hase to have a Special Character in it to mark that everyhing down from here is to be handled together ...

This SpecialCharacter can be defined by User and is than matched by RegExp ...

You can use ( as Marker ...As example if you have Video folders like

\Alien 1 (xxxx)\movie.mkv

\Alien 1 (xxxx)\extrathumbs\thumb1.png

and / or

\TERMINATOR\Terminator 1 (xxxx)\movie.mkv

\TERMINATOR\Terminator 2 (xxxx)\movie.mkv

and / or

\KIDS\DISNEY\Aristocats (xxxx)\movie.mkv

In this case the folderdepth dont mather ...


Best to implemet both, if both are set then the search for the SpecialCharacter starts from the given FolderDepth .... so every weird combination is possible  ...



I know its not easy to realize.

It sounds easy from first sight, but if you think about it there are still a lot of questions how Liquesce should act in some Situations.

And i see no easy way to do it in the same task as writing a file.

Keeping Folders/Subfolders together and the other 'MANAGING/REARANGING' things like NLS wrote in his post at Sep 5 at 7:08 PM are more like "Post-Processing" / "Garbage-Collection" Tasks ....

I think a task/therad, that is started automatic by changes/time can do this better.

But it has to be done by dokan and Liquesce, so the user has access to the file even if it is in the middle of the move of a big file..


Example:

One Videofolder has to rerearranged from Location A to B by Keep together Rule,

the Rearranging Process start,

in the midddle of the moving of a 10GIG Videofile the user decide to watch the Movie.

The File is opened from Location A, then a little bit later, copying is done, but we cant delete the old (Loc A) file couse it is in Use (User plays Video).

If i understand it right how dokan and Liquesce work, we can change the FIlehandle to the new FIle (Loc B) to the same Offset and free the old filehandle so it can be deleted !?


This is only one of a lot of things which can happen if we think about rearranging files.

What if in the same example the User want to change the file ? We have to redirect write to Location B, during copying the file !? (writing/reading to/from FileOffsets in LocB which arent even done by copying !?)

Or am i completly wrong to think about things like this !?



4.
Integrating Parity / disParity (DP) ... do we need it !?

I use disparity since the beginnings and it works fine ....
I dont think that its the job of Liquesce to do Parity ....
There is a Programm which do the job fine .... even if developement stalls a little bit in last time ...
But in case of one thing Liquesce can possibly support disparity ....
If you want to make parity sheduled ... there is the problem of write access during making parity ...

If there is a way to be shure for the DP-Starting-Script that there is no write access the FIles, and then switch to read only ....
It would be safer to shedule the task ....
Something like an API or commandline Tool which give a returncode if no files are openend in write mode, and lock writing in the same Task,
or tell the script that writing is going on and making the parity has to be delayd .....

Another possiblity is that Liqesce shedule disparity ... after writing is finished and some waiting time passed whitout writes,
Liquesce switch to read only and call a userdfefinable script, after the script ended (or a Timeout) it switch back from read only ....

The best but most comlicated way is to do something like VolumeShadowing like windows do ...
If shadowing is on then all write attemps are written to Temporary files, so disparity has consistent data ...
After disparity finished, shadowmode is deactivated and all writes to temp are done to the real files ...

These are all jobs which can be done by the Tray-App ....



Thanks for reading and thinking about it ....

Coordinator
Oct 5, 2010 at 9:20 AM
Below

1. More than 1 Drive Letter  ....

Yes, agreed this will be the first thing to do in Phase 2


2. A Write-Flag.

Liquesce can then go thru the list of sources from Top to Bottom and find the first writable drive, or return an error if none is writable > ReadOnly-MODE or all out of space.

The Freespace is calculated consisting of only the writable drives.

I like it.. No reason why not.


3.

This just needs a separate thread, as it covers several issues, which start to sound a little complicated.


4. Integrating Parity / disParity (DP) ... do we need it !?

I will re-interpret this as being able to launch another tool within a time window if
- no writes are taking place,
- All sources are then set to read-only (Re-using the functionality from 2)
- Wait for the scripted app to exit.
It should not be limited to just DP, Other tools like MS backup, Compressors, could also be used.



Thanks for reading and thinking about it ....

Thanks for the input :-)

Oct 5, 2010 at 3:48 PM
Edited Oct 5, 2010 at 4:31 PM
smurfiv wrote:
Below


4. Integrating Parity / disParity (DP) ... do we need it !?

I will re-interpret this as being able to launch another tool within a time window if
- no writes are taking place,
- All sources are then set to read-only (Re-using the functionality from 2)
- Wait for the scripted app to exit.
It should not be limited to just DP, Other tools like MS backup, Compressors, could also be used.
You are right thats what i mean .......
It should'nt be limited to DP .... but DP is for what specialy I need it ....
Scheduled by time or ...
by calling API or commandline from external script or ....
...something like an event driven system like
  OnAccessIDLE
  OnWriteIDLE
  OnAccess
  OnWrite
  OnMount
  OnDisMount
  On......
The benefit is ... there are always tasks to do in backgroud ... where the user can work against scripts  ....
And usaly the script has no chance to hinder this actions ....
If Liqesce can block unwanted user-interaction during the run of a scripts ... Liquesce is useful even for user who dont need to join Folders ...
OK your script can use WMI and an Eventsink to get notified for changes ... but this method isnt safe ... there are operations which dont creat an event .... specialy copying a lot of small files ...
And to be notified means not that you can block it ....
One possible path for Liquese can be to make it more common ... somting like a Plugin system ....
.... where joining Folders together is only one Plugin .....
.... or another is to make the Parity like disparity , or do backaps, or sort fles/folders from an In-Folder to their destination,
.... or what ever ......

 

 

Coordinator
Oct 5, 2010 at 4:56 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Coordinator
Oct 5, 2010 at 4:57 PM
Edited Oct 5, 2010 at 5:05 PM

Item points moved to

1) http://liquesce.codeplex.com/workitem/7346

2) http://liquesce.codeplex.com/workitem/7347

3) Not done - needs to be refocused I think

4)  http://liquesce.codeplex.com/workitem/7345

Coordinator
Oct 5, 2010 at 4:59 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Coordinator
Oct 5, 2010 at 5:03 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Oct 6, 2010 at 7:59 PM

I don't think that a disparity integration is a good idea...

here comes why:

the only benefit would be an easyer way to configure liquesce + disparity but there is a big disadvantage ov disparity. Since it is a so called "snapshot raid" it is only applicable for very very static data. If there are some often changed files like Thumbs.db for example, these can destroy the consistency of parity and large files like gb .mkvs can't be restored any more. So basically you won't be able to reconstruct 100% of your data.

The second big disadvantage of snapshot raids is that during the parity calculation it is completely unsecure. FlexRAID has an workarround for this which needs a second disk for parity backup but if a disk crashes during a snapshot parity generation (which will happen most of the times because of the high hdd's load) data is lost!

So why making this compromise if it is possible to make a live parity calculation in liquesce. Thats my goal for this software and if we have a basic set of features, I will start working on this. I wan't the security and dependability of a normal raid and the flexebility of a snapshot raid. :-D

Using Liquesce with Disparity is already possible and an integration wouln't be a high improvement. A really killer would be a live parity calculation!

 

Sorry to throw so hard words... I like Disparity and FlexRaid but most of the users are not knowing how high the risk of a dataloss is.

Oct 12, 2010 at 5:25 PM

 

I dont agree completly ... LIVE-Parity is fine ... you dont have to start the calculation of Parity by hand or shedule ....

But the risk is nearly the same if Parity is made Live or started by hand ... in Case of a crash during creation of parity .....

Its only about the way how disparity update the parity data that is a problem ....

And i think disparirty dont crash the whole Parity date in worst case .. 80-90%  i think can be rebuld ... it depend on the kind of changes ....

The problem is that roland diddnt implemented ways to recover as much data as possible even in worts case scenario ....

And even at Live-Parity  there is a time where you have to overwrite data and thats the critical pont .... you can keep this window small as possible but in cost of performance ....

And  .... Thumbs.db is a hidden file and can excluded from Parity .....

 

>> "snapshot raid" it is only applicable for very very static data

 

DisParity IS for STATIC data ... not for dayly work DATA ...

Most Media Collections are STATIC ....

If you have sorted and Renamed it as desired ... then there is no need to change something ....

And new Added Files ... yes they are Lost in case of crash before parity is rebuild... but the Parity isnt damaged .....

 

As Example i hav a IN-Folder where i Sort and Organzie the FIles ... (which is NOT in Parity SET) ... ond only after Finishing i move them to My ArchiveFolders and then Rebuld Parity ...

 

And if you have permanent changed Extra Data files along your Media files ....

That can be a chance wher Liquesce can shine ..... it can seperate them Physically form the STATIC files and store them on another Drive/Folder ... For the User there is no differenc cous its joined ....

And if you wan to secure this files tooo ..... put them to a seperate ParitySet ... if needed you can actualize this Parity set more often as the STATIC data ... and cous there are no TB of Files it is fast too ....

 

 

 

Oct 12, 2010 at 7:04 PM

sorry to say that so hard but thats exactly the problem. everybody thinks that it's just "ok" to use snapshot raid with NEARLY STATIC data (because there is no 100% static data).

From the mathematical point of view: 2 data drives 1paratiy...

1mb changes (just a file update) -> 1mb parity is outdated and corrupt + 1 mb data on each data disk is unsecure and on a hdd crash this 1mb can't be restored (even no old version or something!). If you don't belive it then try it or calculate it.

And if a disk will crash, when is the highest risk for doing this? There are two high risks: When the disk spins up and when a large data transfere is performed (like in a parity snapshot!). Most disks crashes on spinup or backup!!! Think about this when you say thats only the time where the recalculation ist done, where snapshot raids are unsecure (which is not true).

This discussion I had with the FlexRAID programmer and many other snapshot raid useres. Most of them aggreed in the end... The risk of data loss is higher then you'd expect first, but it is a hard way to see this ;).

 

To the thing that it is always unsecure even with live calculation... Did you ever heard about raidz, drobo, ....? I don't think that raidz has a bad performance *lol* and it does parity and is 100% secure if its configured right. It is a fact that it's possible.

 

So sorry again but thats the fact. Believe it or not ;-) And again those who are implementing flexraid and disparty... Respect... You did a good job and its better to use a snapshot raid than you have no redundancy for your data. But there is large room for improvement!

Oct 14, 2010 at 1:21 AM



Again YES and NO ....

> NEARLY STACIC DATA ...
You can make them a LITTLE MORE STATIC as i do ... expleneation at near the End of Post ...
And even if not ... you can MAKE it STATIC .. like i suggested here somewhere ...

The most Files ARE STATIC ... specialy the BIG Files ... MOVIES / MP3 ...
Only the Extradata isnt static like XML, NFO, JPG, TBN, .... what ever ...
LIquesce (or even a background script) can seperate them to different folders/drives ..
and LIquesce join them  for the User to one ...
So we can have ---> 1 ParitySet with REAL STATIC DATA
and a second ... MUCH SMALLER ONE ... with changing DATA ....
There is ALWAYS a solution .....



Im not a NOOB ... in know how parity work and in know the Risks ....

And Parity cant replace Backup .....

The question is what kind of Data do you want to secure ...

All my Pictures and Home Videos ... specially of my Son are on the disparity Raid ...
... AND i have a copy of them on an external Harddrive
... AND i burn them on DVD from time to time ...
... AND a lot of it is spread to Parents and Family .....
More redudant and Decentralized backup/security cant be done ....

But for Music and Movies .. my Paranoia Level is much lower ....
THis stuff can get back again by download or RIPING ...
And even Music/Movies are partly shared with Friends ...
so even in the most worst case scenario, not everything is gone ...

I compare disparity alwas with RAID5 .....
RAID5 is the thing most people use or think at first if talk about "PROFESSIONAL" securing Data ....
But the only advantage of RAID5 is high aviability of Data ....

For me there is not much difference between RAID5 and Live-RAID ...
OK i know thats not true ... and LIve-RAID has a lot of advantages against RAID5 ...

One of the bigest problem of RAID5 in my opinion is striping ...
so if things goes wrong than you loose everything ...
And you cant rebuild or rearrange the whole thing flexible ...
Fill a RAID5 with 20-30 TB Data, if you want to go back to NO-RAID
or Rearrange your Hardisks then the question is ... where to teporary put 30TB Data !?
 
Another thing is ... and thats a thing RAID5 shares with Live-RAID .....
If i am so DUMB and delete a bunch of folders .. in case of RAID5/Live-RAID in the same moment ...
The Parity is updated ... and there is no chance to recover the files ...
OK ... there is Undelete ... but if the Files are Damaged (by SW) or overwrittten ... !?

OK its not a so common scenario ... but - I AM - more happy and feel safe if i rebuld parity by hand...
And only in case i am shure that everything is right and ok ....

And about which case of crash do you talk in your SpinUp example ....

As long as only ONE thing goes wrong there is NO LOSE of Data ...

If PC hungs and Parity is damaged ... nothing is broken .. so i only need to Rebuld Parity...
If 1 Parity Drive or 1 Datadrive is broken then there is no problem too as long as no other drive broke ..
OK ... if you wait some weeks until you replace the broken drive the risk is high
... but thats up to you ..

Even if 2 Drives broken ... only the Data of broken drives is loosed .... ( if on is the ParityDrive than only one)

And Live Parity even cant help if 2 Drives are broken ( Exepct DoublePaity ) ...

NEW ADDED Files arent secured until next update ...
And changed/Deleted files hinder some other files to be recoverd until next ParityUpdate ....

As long as you know the Risks, its up to you to decide when to RebuildParity ...

And to FLexRaid ... i tested it before disparity ... but i dont like it ....
Dont ask me why .. it worked ... but the simple way of disparity ... and the fact that its only a
simple commandline exe ... nothing to install .... i like it more ...
Im a fan of scripts and doing things by hand ... so i have the cotrol of when and how ....

If you have Files which are unique and if you loose them you never get them back ... like Picturs and Videos of Family Events ...
So dont trust no System ... not RAID5, not disparity, not FlaxRaid, ...
You need at least 2 Backups on different Media and keepd in different places ....

If we dont talk about unique Memories, or Critical Buissnes Data then there is no need of
99% safety (100%safety dont exist) ... its a calculation of Risk agaist the worth of Data

So .. as i said first .. your right and at same time not ....

I prefere snapshootParity couse it fits my neede and how i like it, and the Level of security i need
.. more then other systems ...

Its nothing to say against Live-PARITY ... but from MY sight, in MY CASE,
there is no real benefit for me ...
Even more ... i loose controle over generation of Parity ...

PS:

And before you start talking aubout the risk of more than 1 broken drive
cous Manufacture failure of a series ..

I NEVER buy more than one Drive at the same time in the same shop
moste of my drives are not the same Modell ...
So Risk is spread ... at least in theory ....

I have more than one ParitySet (Script swap config File of disparity),
they are grouped by how often things change, which means -> STATIC LEVEL
So risk of damaging parity is lower and time to update is lower too ....

I use Hardlinks (ntfs reparse points) to build my RAID-SETS ...
So it dont mather how my Real Folder STructure on DataDrives looks like ...

I have a HDRAID folder on C: and there for every ParitySet a subfolder
and there for every DataDrive a RaidDrive-subfolder ....
Now you can put there links to the folders you want to include in the SET ...

You can always change of what folders your ParitySet consist, add new Folders ...
It works fine ... as long as you take rule of how Parity works and dont mix different DataDrives
to the same RaidDriveSubfolder ...

Ive thinked alot over this things, tryed a lot,
before i decided to do it the way i do it now ....

And to RAID-Z ... NICE .. but not WIN ... and ....

... for me it looks like it USE STRIPE SETS AT BLOCK LEVEL ... which means ...
For me the same disadvantage as RAID5 ... you cant see / uses every Drive as seperate ....
So if things go bad then they go real bad .. and in case you want to change your System ...
You need TEMP space to break up your RAID and free the drives

That is one of my MAIN REASONS for disparity ... if i decide tomorrow ... i dont like disparity ...
I delete my Paryity Drive and reuse it for someting else ... and thats it ....
In all other Cases ... the question is ... where to Hell .. store my data teporary to free my drives ...

OK ... OK ... i know it was the SPEED and SAFETY of RAIDZ you talked about ...
But its a complete other way it works ... at block level ....
so i think its not fair to compare it with a system which works on File Level ...
And the File Level in my opinion is the advantage of disparity ....

I know somebody who build a 25TB RAID5 (he has the money) and filld it up with untouchd BD-RIPS ...
Then he decided that RAID5 is not the best solution for him ...
The problem was where to store the files temporary ?
So he buyed another 25TB to store them ....
Ok i have no 25TB to store ... NOW ... but what about in some Years ...
I have to keep my Solution Open in every Direction ....

It sounds like im the Nr 1 Fan of DP .... thats not true ... its not open source ....
Not that i can use the source to improve it, i can do scripting, I can read the source,
in most cases understand what happen, and change it alittle bit, even code simple Programms,but thats all.
But others can do, and in case of DP, if roland decide to stop developement, there is nothing we can do.

Ive checkd every other possiblity even linux things ... LVM and a lot of even beta FS ..
I diddnt find any other FileSystem-Level Parity
There is only DP and FlexRaid, and a 3rd one which is comercial and i dont remember the name ...



-------


But to come back to the REASON WHY this discussion started:

I never talked SPECIALY about integrating DP ....
It started with my suggestion for :

1.
A way that
poor scripter like me ;-)
can do Background Tasks on the Data of the VirtualDrive without the fear of user WriteOperations...
Which means a way a Script can Check if there are write-opperations and if not switch the Drive to ReadOnly.
  
2.
Something like an Eventsystem to Call scripts in case like

OnReadIDLE
OnWriteIDLE

Which is another way to implement 1.

Its not LIMITED to DP ... more its usefull for every kind of Background-Script
that want to do something with the files that Liquesce use for the VirtualDrive.

I said that it would be useful for DP in MY CASE ....


OK ... but now this POST is definiviely to long ...
and im starting with some extra stuff so is split it to another Post: FRAMEWORK / PLUGIN

Oct 14, 2010 at 8:25 AM

Ok just to make that clear in short:

1.) I'm pro scripting interface but I don't want to code it or I don't want that liquesce perform worse because of such a feature. If somebody wants to implement then do it... Maybe one day I use it too for other things.

2.) I was never talking about double drive crashes. I was just comparing things I've planed for liquesce and things we already have. I can't see why you say that RAID5 is known as a professional solution? RAID5 is old fashioned, and unflexible! By the way some statistics: Did you know that RAID5 leaded more often to data lost than to prevent it because of the wrong handling of it and not backing up. Of course years ago used from professionals it was working but this 50$ pseudo raid5 controllers and half knowing users often produced headache ... I've read this in an international magazine for administrators some years ago. Anyway... Fact is that with one parity drive, a snapshot raid can loose data on a one disk fail but there is a way to prevent this! Thats my goal and not to make the same mistake like most of raid5 users did and ignoring some risks ;).

3.) To the risks: I just was saying that when a snapshot raid constructs it's parity, it isn't secured (flexraid and disparity). And this is exactly the time where a high risk of a disk crash is available (high data load).

4.) This idea with a different parity strategy for different files (depending on file type or size) is interesting. I think I could use this for my plans also in a way, static data will be secured with parity... non static with mirroring or something else which is faster. The question is how to decide which data is static and also for static data I'd prefer a live parity calculation instead of a snapshot algorithm.

 

So... I don't do this implementation, ... I won't stop anybody if he wants to implement this, as long as he/she doesn't destroy liquesce ... I have other plans with liquesce which makes this feature unnecessary from my point of view.

regards

Oct 15, 2010 at 12:29 AM

 

>> I can't see why you say that RAID5 is known as a professional solution?

But most people think that ... (not MY opinion)

And it was it ... in case of battery bufferd RAM Controller ... not OnBoardChpiset one ... but ther are better ways now .....

 

>>4.) decide which data is static

If we want to be exact ... only the User can decide this ... he knows what he Store ....

In case of Movies that are MKV,...

If you are a Photograph than your RAW Pictures / the Originals normaly never change ....

.... what else ...

 

If we need to find an automatic way ... i think for the beginnig the theory of big Files are more staic as smalls is not so bad ...

Eventualy you move them or rename them .. but that can be recognized by Checksum and only the Lists has to be corrected

... no need to create new parity .... (as LIVE-System its even more easy ... you KNOW that a move/rename happen whitout CRC)

 

So one simpe way is to sort the Files on each Datadrive by Size and then xor them in this order ....

So no or slow changing bigfiles xord togeter, and small or fast changing too ....

 

Or by lastMod date ... but thats a sheduled optimize/reorganize task and cant be decide in the moment of change (LIVE)

 

But if i rethink ... even the Filesize-thing cant be done LIVE and is a optimize later during IDLE-Thing

 

 

>> This idea with a different parity strategy ... non static with mirroring

If you want to do mirroring ... you dont need another method ....

For NoneStatic Files instead of REAL-Mirroring ... use the Bytes of the FIle as Parity .... speed is nearly the same as makin a copy ...

Thats the same as Xor it with 0 on the other drives and has the same effect as mirroring but you dont have 2 seperate systems (Parity/Mirror)...

In the ParityMetaData mark it with a MirrorFlag so its excluded from the Possible xor-Marriage-List ... if files on other Drives added ...

 

Another thing is .... Parity Strategy ....

How do you decide wich files you marry together ?

Ok .... in a closer sight we have to talk about Parts of files cous we partition files to blocks ... but to keep it simple i talk only about files ...

The simplest way is  ... how it comes ....

User add file on DriveA ...

If ther are files on DriveB whitout a Partner on DriveA  ... then take the first and xor them .... and so on for all other Datadrives ...

If ther is no "Single"-FIle then use Zero Bytes .... for that Drive ....

Now, if we want to keep Parity small as possible ... we have to marriage as example a Big FIle on A with a Medium and some small files an B ...

so we keep Padding small... Pading means to ineffectively use the written Parity ....

( ... yes there is another padding ... if FIlesize / Blocksize <> 0 )

 

But if we say ... i have a lot of space on my ParityDrive ... than we can decide to only marriage files of the same SizeClasss/StaticLevel ...

Even if this means that we waste Parity Bytes ....

 

That are some thougths about Live Parity ......

How i imagine that it possible works .... or is the way you plan to do it .. a completly other ?

 

>> 1) I don't do this implementation, ...

Its OK ...ideas and suggestions are that what their name say .. and not a mandatory Feature List ....

If they bring new point of view, or spin off other ideas, or at least define a direction you dont want to go ... then they are usefull ..